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Intelligence 

Scientists have proposed two major “consensus” definitions of intelligence: 

(i) horn Mainstream Science on Intelligence (1994); 

A very general mental capability that, among other things, involves the ability to reason, plan, solve 
problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience. It 
is not merely book learning, a narrow academic skill, or test-taking smarts. Rather, it reflects a 
broader and deeper capability for comprehending our surroundings- making sense" of things, or 
"figuring out" what to do. 

(ii) from Intelligence: Knowns and Unknowns (1995); 

Individuals differ from one another in their ability to understand complex ideas, to adapt effectively 
to the environment, to learn from experience, to engage in various forms of reasoning, [and] to 
overcome obstacles by taking thought. Although these individual differences can be substantial, 
they are never entirely consistent: a given person's intellectual performance will vary on different 
occasions, in different domains, as judged by different criteria. Concepts of "intelligence" are 
attempts to clarify and organize this complex set of phenomena. 

Thus, intelligence is: 

- the ability to reason 

- the ability to understand 

- the ability to create 

- the ability to Learn from experience 

- the ability to plan and execute complex tasks 


What is Artificial Intelligence? 

"Giving machines ability to perform tasks normally associated with human intelligence." 

Al is intelligence of machines and branch of computer science that aims to create it. Al consists of 
design of intelligent agents, which is a program that perceives its environment and takes action that 
maximizes its chance of success. With Ai it comes issues like deduction, reasoning, problem solving, 
knowledge representation, planning, learning, natural language processing, perception, etc. 

"Artificial Intelligence is the part of computer science concerned with designing intelligence 
computer systems, that is, systems that exhibit the characteristics we associate with intelligence 
in human behavior." 

Different definitions of Al are given by different books/writers. These definitions can be divided into 
two dimensions. 
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Systems that think like humans 

Systems that think rationally 

"The exciting new effort to make computers 

"The study of mental faculties through the use of 

think. machine with minds, in the full and literal sense." 

computational models." (Charniak and McDermott, 

(Haugeland, 1985) 

1985) 

"[The automaton of] activities that we associate with 

"The study of the computations that make it possible 

human thinking, activities such as decision-making, 
problem solving, learning." (Bellman, 1978) 

to perceive, reason, and act." (Winston, 1992) 

Systems that act like humans 

Systems that act rationally 

" The art of creating machines that perform functions that 

"Computational Intelligence is the study of the design 

require intelligence when performed by people." 
(Kurzweil, 1990) 

of intelligent agents." (Poole et al., 1998) 

"The study of how to make computer do things at which, 
at the moment, people are better." (Rich and Knight, 1991) 

"Al... is concerned with intelligent behavior in 
artifacts." (Nilsson, 1998) 


Top dimension is concerned with thought processes and reasoning, where as bottom dimension 
addresses the behavior. 

The definition on the left measures the success in terms of fidelity of human performance, whereas 
definitions on the right measure an ideal concept of intelligence, which is called rationality. 

Human-centered approaches must be an empirical science, involving hypothesis and experimental 
confirmation. A rationalist approach involves a combination of mathematics and engineering. 

Acting Humanly: The Turing Test Approach 

The Turing test, proposed by Alan Turing (1950) was designed to convince the people that whether 
a particular machine can think or not. He suggested a test based on indistinguishability from 
undeniably intelligent entities- human beings. The test involves an interrogator who interacts with 
one human and one machine. Within a given time the interrogator has to find out which of the 
two the human is, and which one the machine. 


HUMAN 

INTERROGATOR 
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The computer passes the test if a human interrogator after posing some written questions, cannot 
tell whether the written response come from human or not. 

To pass a Turing test, a computer must have following capabilities: 

> Natural Language Processing: Must be able to communicate successfully in English 

> Knowledge representation: To store what it knows and hears. 

> Automated reasoning: Answer the Questions based on the stored information. 

> Machine learning: Must be able to adapt in new circumstances. 


Turing test avoid the physical interaction with human interrogator. Physical simulation of human 
beings is not necessary for testing the intelligence. 

The total Turing test includes video signals and manipulation capability so that the interrogator can 
test the subject's perceptual abilities and object manipulation ability. To pass the total Turing test 
computer must have following additional capabilities: 

> Computer Vision: To perceive objects 

> Robotics: To manipulate objects and move 


Thinking Humanly: Cognitive modeling approach 

If we are going to say that a given program thinks like a human, we must have some way of 
determining how humans think. We need to get inside the actual workings of human minds. 

There are two ways to do this: 

- through introspection: catch our thoughts while they go by 

-through psychological experiments. 

Once we have precise theory of mind, it is possible to express the theory as a computer program. 

The field of cognitive science brings together computer models from Al and experimental 
techniques from psychology to try to construct precise and testable theories of the workings of the 
human mind. 

Think rationally: The laws of thought approach 
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Aristotle was one of the first who attempt to codify the right thinking that is irrefutable reasoning 
process. He gave Syllogisms that always yielded correct conclusion when correct premises are 
given. 

For example: 

Ram is a man 
All men are mortal 

■=> Ram is mortal 

These law of thought were supposed to govern the operation of mind: This study initiated the field 
of logic. The logicist tradition in Al hopes to create intelligent systems using logic programming. 
However there are two obstacles to this approach. First, It is not easy to take informal knowledge 
and state in the formal terms required by logical notation, particularly when knowledge is not 100% 
certain. Second, solving problem principally is different from doing it in practice. Even problems 
with certain dozens of fact may exhaust the computational resources of any computer unless it has 
some guidance as which reasoning step to try first. 

Acting Rationally: The rational Agent approach: 

Agent is something that acts. 

Computer agent is expected to have following attributes: 

> Autonomous control 

> Perceiving their environment 

> Persisting over a prolonged period of time 

> Adapting to change 

> And capable of taking on another's goal 
Rational behavior: doing the right thing. 

The right thing: that which is expected to maximize goal achievement, given the available 
information. 

Rational Agent is one that acts so as to achieve the best outcome or, when there is uncertainty, the 
best expected outcome. 

In the "laws of thought" approach to Al, the emphasis was given to correct inferences. Making 
correct inferences is sometimes part of being a rational agent, because one way to act rationally is 
to reason logically to the conclusion and act on that conclusion. On the other hand, there are also 
some ways of acting rationally that cannot be said to involve inference. For Example, recoiling from 
a hot stove is a reflex action that usually more successful than a slower action taken after careful 
deliberation. 

Advantages: 
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> It is more general than laws of thought approach, because correct inference is just one of 
several mechanisms for achieving rationality. 

> It is more amenable to scientific development than are approaches based on human 
behavior or human thought because the standard of rationality is clearly defined and 
completely general. 


Characteristics of A.I. Programs 

• Symbolic Reasoning: reasoning about objects represented by symbols, and their properties 
and relationships, not just numerical calculations. 

• Knowledge: General principles are stored in the program and used for reasoning about 
novel situations. 

• Search: a "weak method" for finding a solution to a problem when no direct method exists. 
Problem: combinatoric explosion of possibilities. 

• Flexible Control: Direction of processing can be changed by changing facts in the 
environment. 

Foundations of Al: 


Philosophy: 

Logic, reasoning, mind as a physical system, foundations of learning, language and rationality. 

> Where does knowledge come from? 

> How does knowledge lead to action? 

> How does mental mind arise from physical brain? 

> Can formal rules be used to draw valid conclusions? 


Mathematics: 

Formal representation and proof algorithms, computation, undecidability, intractability, 
probability. 

> What are the formal rules to draw the valid conclusions? 

> What can be computed? 

> How do we reason with uncertain information? 

Psychology: 

Adaptation, phenomena of perception and motor control. 


> How humans and animals think and act? 

Economics: 

Formal theory of rational decisions, game theory, operation research. 
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> How should we make decisions so as to maximize payoff? 

> How should we do this when others may not go along? 

> How should we do this when the payoff may be far in future? 

Linguistics: 

Knowledge representation, grammar 

> How does language relate to thought? 

Neuroscience : 

Physical substrate for mental activities 

> How do brains process information? 

Control theory : 

Homeostatic systems, stability, optimal agent design 

> How can artifacts operate under their own control? 


Brief history of Al 

- 1943: Warren Me Culloch and Walter Pitts: a model of artificial boolean neurons to perform 
computations. 

- First steps toward connectionist computation and learning (Hebbian learning). 

- Marvin Minsky and Dann Edmonds (1951) constructed the first neural network computer 

- 1950: Alan Turing's "Computing Machinery and Intelligence" 

- First complete vision of Al. 


The birth ofAl (1956): 

- Dartmouth Workshop bringing together top minds on automata theory, neural nets and the 
study of intelligence. 

- Allen Newell and Herbert Simon: The logictheorist (first nonnumeric thinking program used 
for theorem proving) 

- For the next 20 years the field was dominated by these participants. 


Great expectations (1952-1969): 

- Newell and Simon introduced the General Problem Solver. 
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- Imitation of human problem-solving 

- Arthur Samuel (1952-) investigated game playing (checkers ) with great success. 
-John McCarthy(1958-) : 

- Inventor of Lisp (second-oldest high-level language) 

- Logic oriented, Advice Taker (separation between knowledge and reasoning) 


- Marvin Minsky (1958 -) 

- Introduction of microworlds that appear to require intelligence to solve: e.g. blocks- 
world. 

- Anti-logic orientation, society of the mind. 


Collapse in Al research (1966 -1973): 

- Progress was slower than expected. 

- Unrealistic predictions. 

- Some systems lacked scalability. 

- Combinatorial explosion in search. 

- Fundamental limitations on techniques and representations. 

- Minsky and Papert (1969) Perceptrons. 


Al revival through knowledge-based systems (1969-1970): 

- General-purpose vs. domain specific 

- E.g. the DENDRAL project (Buchanan et al. 1969) 

First successful knowledge intensive system. 

- Expert systems 

- MYCIN to diagnose blood infections (Feigenbaum et al.) 

- Introduction of uncertainty in reasoning. 

- Increase in knowledge representation research. 

- Logic, frames, semantic nets,... 

Al becomes an industry (1980 - present): 

- R1 at DEC (McDermott, 1982) 

- Fifth generation project in Japan (1981) 

- American response... 


Puts an end to the Al winter. 

Connectionist revival (1986 - present): (Return of Neural Network): 
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- Parallel distributed processing (RumelHart and McClelland, 1986); backprop. 


Al becomes a science (1987 - present): 

- In speech recognition: hidden markov models 

- In neural networks 

- In uncertain reasoning and expert systems: Bayesian network formalism 


The emergence of intelligent agents (1995 - present): 

- The whole agent problem: 

"How does an agent act/behave embedded in real environments with continuous sensory 
inputs" 

Applications of AI; (Describe these application areas yourself) 

> Autonomous planning and scheduling 

> Game playing 

> Autonomous Control 

> Expert Systems 

> Logistics Planning 

> Robotics 

> Language understanding and problem solving 

> Speech Recognition 

> Computer Vision 


Knowledge: 

Knowledge is a theoretical or practical understanding of a subject or a domain. Knowledge is also 
the sum of what is currently known. 

Knowledge is "the sum of what is known: the body of truth, information, and principles acquired 
by mankind." Or, "Knowledge is what I know. Information is what we know." 

There are many other definitions such as: 

- Knowledge is "information combined with experience, context, interpretation, and reflection. It 
is a high-value form of information that is ready to apply to decisions and actions." (T. Davenport 
et al., 1998) 

- Knowledge is "human expertise stored in a person's mind, gained through experience, and 
interaction with the person's environment." (Sunasee and Sewery, 2002) 

- Knowledge is "information evaluated and organized by the human mind so that it can be used 
purposefully, e.g., conclusions or explanations." (Rousa, 2002) 
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Knowledge consists of information that has been: 

- interpreted, 

- categorised, 

- applied, experienced and revised. 

In general, knowledge is more than just data, it consist of: facts, ideas, beliefs, heuristics, 
associations, rules, abstractions, relationships, customs. 

Research literature classifies knowledge as follows: 


Classification-based Knowledge 
Decision-oriented Knowledge 
Descriptive knowledge 
Procedural knowledge 
Reasoning knowledge 
Assimilative knowledge 


» Ability to classify information 
» Choosing the best option 

» State of some world (heuristic) 

» How to do something 
» What conclusion is valid in what situation? 
» What its impact is? 


Knowledge is important in Al for making intelligent machines. Key issues confronting the designer 
of Al system are: 

Knowledge acquisition: Gathering the knowledge from the problem domain to solve the Al 
problem. 

Knowledge representation: Expressing the identified knowledge into some knowledge 
representation language such as propositional logic, predicate logic etc. 

Knowledge manipulation: Large volume of knowledge has no meaning until up to it is processed to 
deduce the hidden aspects of it. Knowledge is manipulated to draw conclusions from 
knowledgebase. 

Importance of Knowledge: 

Learning: 

It is concerned with design and development of algorithms that allow computers to evolve 
behaviors based on empirical data such as from sensor data. A major focus of learning is to 
automatically learn to recognize complex patterns and make intelligent decision based on 
data. 
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A complete program is said to learn from experience E with respect to some class of tasks T 
and performance measure P, if its performance at tasks in T, as measured by P, improves 
with experience E. 
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Intelligent Agents 

An Intelligent Agent perceives its environment via sensors and acts rationally upon that 
environment with its effectors (actuators). Hence, an agent gets percepts one at a time, and maps 
this percept sequence to actions. 

Properties of the agent 

- Autonomous 

- Interacts with other agents plus the environment 

- Reactive to the environment 

- Pro-active (goal- directed) 


sensors 



What do you mean, sensors/percepts and effectors/actions? 

For HumMS 


- Sensors: Eyes (vision), ears (hearing), skin (touch), tongue (gestation), nose 
(olfaction), neuromuscular system (proprioception) 

- Percepts: 

• At the lowest level - electrical signals from these sensors 

• After preprocessing - objects in the visual field (location, textures, colors, 
...), auditory streams (pitch, loudness, direction),... 

- Effectors: limbs, digits, eyes, tongue,. 

- Actions: lift a finger, turn left, walk, run, carry an object,... 


A more specific example: Automated taxi driving system 

• Percepts: Video, sonar, speedometer, odometer, engine sensors, keyboard input, 
microphone, GPS,... 

• Actions: Steer, accelerate, brake, horn, speak/display,... 

• Goals: Maintain safety, reach destination, maximize profits (fuel, tire wear), obey laws, 
provide passenger comfort,... 

• Environment: Urban streets, freeways, traffic, pedestrians, weather, customers,... 
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[ Different aspects of driving may require different types of agent programs!] 

Challenge!! 

Compare Software with an agent 
Compare Human with an agent 
Percept: The Agents perceptual inputs at any given instant. 

Percept Sequence: The complete history of everything the agent has ever perceived. 

The agent function is mathematical concept that maps percept sequence to actions. 

f:P*^A 

The agent function will internally be represented by the agent program. 

The agent program is concrete implementation of agent function it runs on the physical architecture 
to produce/. 

The vacuum-cleaner world: Example of Agent 



Environment: square A and B 

Percepts: [location and content] E.g. [A, Dirty] 

Actions: left, right, suck, and no-op 
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The 

A 

one 

right 


Percept sequence 

Action 

[A,Clean] 

Right 

[A, Dirty] 

Suck 

[B, Clean] 

Left 

[B, Dirty] 

Suck 




is filled out correctly. 
What is the right thing? 


concept _of 

rationality 

rational agent is 

that does the 
thing. 


- Every 
entry in the table 


- Right action is the one that will cause the agent to be most successful. 

Therefore we need some way to measure success of an agent. Performance measures are the 
criterion for success of an agent behavior. 

E.g., performance measure of a vacuum-cleaner agent could be amount of dirt cleaned up, amount 
of time taken, amount of electricity consumed, amount of noise generated, etc. 

It is better to design Performance measure according to what is wanted in the environment instead 
of how the agents should behave. 

It is not easy task to choose the performance measure of an agent. For example if the performance 
measure for automated vacuum cleaner is "The amount of dirt cleaned within a certain time" Then 
a rational agent can maximize this performance by cleaning up the dirt, then dumping it all on the 
floor, then cleaning it up again , and so on. Therefore "How clean the floor is" is better choice for 
performance measure of vacuum cleaner. 

What is rational at a given time depends on four things: 


- Performance measure, 

- Prior environment knowledge, 

- Actions, 

- Percept sequence to date (sensors). 


Definition: A rational agent chooses whichever action maximizes the expected value of the 
performance measure given the percept sequence to date and prior environment knowledge. 


Environments 
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To design a rational agent we must specify its task environment. Task environment means: PEAS 
description of the environment: 

- Performance 

- Environment 

- Actuators 

- Sensors 

Example: Fully automated taxi: 

• PEAS description of the environment: 

Performance: Safety, destination, profits, legality, comfort 

Environment: Streets/freeways, other traffic, pedestrians, weather,,... 

Actuators: Steering, accelerating, brake, horn, speaker/display,... 

Sensors: Video, sonar, speedometer, engine sensors, keyboard, GPS,... 


Types of Agent 


Simple Reflex Agent 

• Table lookup of percept- action pairs defining all possible condition- action rules necessary 
to interact in an environment 

• Problems 

- Too big to generate and to store (Chess has about 10 A 120 states, for example) 

- No knowledge of non- perceptual parts of the current state 

- Not adaptive to changes in the environment; requires entire table to be updated if 
changes occur 

• Use condition-action rules to summarize portions of the table 
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Model Based Agents (Reflex Agent with Internal State) 

• Encode "internal state" of the world to remember the past as contained in earlier percepts 

• Needed because sensors do not usually give the entire state of the world at each input, so 
perception of the environment is captured over time. "State" used to encode different 
"world states" that generate the same immediate percept. 

• Requires ability to represent change in the world; one possibility is to represent just the 
latest state, but then can't reason about hypothetical courses of action 
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Goal- Based Agent 

• Choose actions so as to achieve a (given or computed) goal. 

• A goal is a description of a desirable situation 

• Keeping track of the current state is often not enough -- need to add goals to decide which 
situations are good 

• Deliberative instead of reactive 

• May have to consider long sequences of possible actions before deciding if goal is achieved -- 
involves consideration of the future, "what will happen if I do...?" 



Utility- Based Agent 

• When there are multiple possible alternatives, how to decide which one is best? 

• A goal specifies a crude distinction between a happy and unhappy state, but often need a more 

general performance measure that describes "degree of happiness" 

• Utility function U: States --> Reals indicating a measure of success or happiness when at a given 

state 

• Allows decisions comparing choice between conflicting goals, and choice between likelihood of 

success and importance of goal (if achievement is uncertain) 
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State 


How the world evolves 


What my actions do 


Utility 


Sensors**- 



What the world 
is like now 


What it will be like 
if I do action A 


How happy I will 
be in such a state 


What action I 
should do now 


l 


Effectors 


Learning Agent: 

Refer Book: AI by Russel and Norvig 
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Problem Solving: 

Problem solving, particularly in artificial intelligence, may be characterized as a systematic search 
through a range of possible actions in order to reach some predefined goal or solution. Problem¬ 
solving methods divide into special purpose and general purpose. A special-purpose method is 
tailor-made for a particular problem and often exploits very specific features of the situation in 
which the problem is embedded. In contrast, a general-purpose method is applicable to a wide 
variety of problems. One general-purpose technique used in Al is means-end analysis—a step-by- 
step, or incremental, reduction of the difference between the current state and the final goal. 

Four general steps in problem solving: 

- Goal formulation 

- What are the successful world states 

- Problem formulation 

- What actions and states to consider given the goal 

- Search 

- Determine the possible sequence of actions that lead to the states of 
known values and then choosing the best sequence. 

- Execute 

- Give the solution perform the actions. 

Problem formulation: 

A problem is defined by: 

- An initial state: State from which agent start 

- Successor function: Description of possible actions available to the agent. 

- Goal test: Determine whether the given state is goal state or not 

- Path cost: Sum of cost of each path from initial state to the given state. 

A solution is a sequence of actions from initial to goal state. Optimal solution has the lowest path 
cost. 

State Space representation 

The state space is commonly defined as a directed graph in which each node is a state and each arc 
represents the application of an operator transforming a state to a successor state. 

A solution is a path from the initial state to a goal state. 

State Space representation of Vacuum World Problem: 
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States?? two locations with or without dirt: 2 x 2 2 =8 states. 

Initial state?? Any state can be initial 
Actions?? {Left, Right, Suck} 

Goal test?? Check whether squares are clean. 

Path cost?? Number of actions to reach goal. 

For following topics refer Russell and Norvig's Chapter 3 from pages 87-96. 
Problem Types: Toy Problems & Real World Problems (Discussed in class). 
Well Defined Problems (Discussed in class). 

Water Leakage Problem: 
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kitchen-dry 



leak. in_ bathroom 


hall.wet 



problemJn_ kitchen 


bathroom-dry 

window-dosed 


if 


\ 

/ 



leak, in .kitchen 


no.water-from_outside 


no.rain 


then 


If 


then 


If 


hall _wet and kitchen_dry 


leak in bathroom 


hall_wet and bathroom_dry 


problem_in_kitchen 


window closed or no rain 


then 



no_water_from_outside 

Production System: 
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A production system (or production rule system) is a computer program typically used to 
provide some form of artificial intelligence, which consists primarily of a set of rules about 
behavior. These rules, termed productions, are a basic representation found useful in 
automated planning, expert systems and action selection. A production system provides the 
mechanism necessary to execute productions in order to achieve some goal for the system. 

Productions consist of two parts: a sensory precondition (or "IF" statement) and an action 
(or "THEN"). If a production's precondition matches the current state of the world, then the 
production is said to be triggered. If a production's action is executed, it is said to ha we fired. 
A production system also contains a database, sometimes called worcking memory, which 
maintains data about current state or knowledge, and a rule interpreter. The rule interpreter 
must provide a mechanism for prioritizing productions when more than one is triggered. 

The underlying idea of production systems is to represent knowledge in the form of 
condition-action pairs called production rules: 

If the condition C is satisfied then the action A is appropriate. 

Types of production rules 
Situation-action rules 

If it is raining then open the umbrella. 

Inference rules 

If Cesar is a man then Cesar is a person 
Production system is also called ruled-based system 
Architecture of Production System: 

Short Term Memory: 

Contains the description of the current state. 

Set of Production Rules: 

Set of condition-action pairs and defines a single chunk of problem solving 
knowledge. 

Interpreter: 

A mechanism to examine the short term memory and to determine which rules 
to fire (According to some strategies such as DFS, BFS, Priority, first-encounter 
etc) 
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Rule Base 


The execution of a production system can be defined as a series of recognize-act cycles: 
Match -memory contain matched against condition of production rules, this produces a 
subset of production called conflict set. Conflict resolution -one of the production in the 
conflict set is then selected, Apply the rule. 

Consider an example: 

Problem: Sorting a string composed of letters a, b & c. 

Short Term Memory: cbaca 
Production Set: 

1. b a -> ab 

2. ca -> ac. 

3. cb -> be 


Interpreter: Choose one rule according to some strategy. 


Iteration # 

Memory 

Conflict Set 

Rule fired 

0 

cbaca 

1, 2,3 

1 

1 

cabca 

2 

2 

2 

acbca 

2,3 

2 

3 

acbac 

1,3 

1 

4 

acabc 

2 

2 

5 

aacbc 

3 

3 

6 

aabcc 

0 

bait 
















Artificial Intelligence 


Production System: The water jug problem 

Problem: 

There are two jugs, a 4-gallon one and a 3-gallon one. Neither jug has any measuring 
markers on it. There is a pump that can be used to fill the jugs with water. 

How can you get exactly n ( 0, 1, 2, 3, 4) gallons of water into one of the two jugs ? 

Solution Paradigm: 

build a simple production system for solving this problem, 
represent the problem by using the state space paradigm. 

State = (x, y); where: x represents the number of gallons in the 4-gallon jug; y represents the 
number of gallons in the 3-gallon jug. x s{0, 1,2, 3, 4} and y s{0, 1, 2, 3}. 

The initial state represents the initial content of the two jugs. 

For instance, it may be (2, 3), meaning that the 4-gallon jug contains 2 gallons of water and 
the 3-gallon jug contains three gallons of water. 

The goal state is the desired content of the two jugs. 

The left hand side of a production rule indicates the state in which the rule is applicable and 
the right hand side indicates the state resulting after the application of the rule. 

For instance; 

(x, y) such that x < 4 —»(4, y) represents the production 
If the 4-gallon jug is not full then fill it from the pump. 


The rule base contains the following production rules: 


1. (x, 

2. (x, 

3. (x, 

4. (x, 

5. (x. 


6. (x. 


7. (x, 

8. (x. 


The 


y) 

y) 

y) 

y) 

y) 


y) 


y) 

y) 


such, that x < 
such that y < 
such that x > 
such that y > 
such that x + 

such that x + 


such that x + 
such that x + 


4 (4, y) ; Fill the 4-gallon jug from pump 

3 -> (x. 3) ; Fill the 3-gallon jug from pump 

0 (0, y) ; Empty the 4-gallon jug on the ground 

0 -> (x. 0) ; Empty the 3-gallon jug on the ground 

y > 4, x < 4, y > 0 -> (4, y - (4 - x» 

; Completely fill the 4-gallon jug from the 

3- gallon jug 

y>3,x>0,y<3^(x-{3-y),3) 

; Completely fill the 3-gallon jug from the 

4- gallon jug 

y < 4. y > ()"-> (x-y. 0) 

; Empty the 3-gallon jug into the 4-gallon jug 
y < 3, x > 0 -> (0, x - y) 

; Empty the 4-gallon jug into the 3-gallon jug 


short term memory contains the current state (x, y). 



Shiv Raj Pant 





Artificial Intelligence 


Let us consider the initial situation (0, 0) and the goal situation (n, 2) 


short term memory : (0. 0) 

1. Match: 1.2. 2. Conflict resolution: select rule 2. 

short term memory becomes (0. 3) 

1. Match: 1. 4. 7 2. Conflict resolution: select rule 7 

short term memory becomes (3. 0) 

1. Match: 1, 2, 3, 6 . 8 2. Conflict resolution: select rule 2 

short term memory becomes (3. 3) 

1. Match: 1. 3. 4. 5 2. Conflict resolution: select rule 5 

short term memory becomes (4. 2) 


3. Apply the rule 

3. Apply the rule 

3. Apply the rule 

3. Apply the rule 
Goal achieved 


The sequence of the applied rules: 

Fill the 3-gallon jug trout pump 

Empty the 3 -gallon jug into the 4-gallon jug 

Fill the 3-gallon jug from pump 

Fill the 4-gallon jug from the 3-gallon jug 


The Water Jug Problem: Representation 


( 0 . 0 ) 


the search space with depth-first search 
strategy: 


(4,0) 


(4.3) 



(0. 3) 



goal achieved 


Constraint Satisfaction Problem: 
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A Constraint Satisfaction Problem is characterized by: 

• a set of variables {xl, x2,xn}, 

• for each variable xi a domain Di with the possible values for that variable, and 

• a set of constraints, i.e. relations, that are assumed to hold between the values of the 
variables. [These relations can be given intentionally, i.e. as a formula, or extensionally, i.e. 
as a set, or procedurally, i.e. with an appropriate generating or recognizing function.] We 
will only consider constraints involving one or two variables. 

The constraint satisfaction problem is to find, for each i from 1 to n, a value in Di for xi so 
that all constraints are satisfied. Means that, we must find a value for each of the variables 
that satisfies all of the constraints. 


A CS problem can easily be stated as a sentence in first order logic, of the form: 


(exist xl)..(exist xn) (DI(xl) & .. Dn(xn) => Cl..Cm) 

A CS problem is usually represented as an undirected graph, called Constraint Graph where the 
nodes are the variables and the edges are the binary constraints. Unary constraints can be disposed 
of by just redefining the domains to contain only the values that satisfy all the unary constraints. 
Higher order constraints are represented by hyperarcs. In the following we restrict our attention to 
the case of unary and binary constraints. 


Formally, a constraint satisfaction problem is defined as a triple (X,D,C) 

, where X is a set of 

variables, D is a domain of values, and C is a set of constraints. Every constraint is in turn a pair 

where t is a tuple of variables and R is a set of tuples of values; all these tuples having the 
same number of elements; as a result R is a relation. An evaluation of the variables is a function 
from variables to values, V ! X —* D . Such an evaluation satisfies a constraint 

■ ■ ■ i *^«)i ^)if ( t! (^l)j ■ ■ ■ i ^(^n)) A solution is an evaluation that satisfies 

all constraints. 


Constraints 

• A constraint is a relation between a local collection of variables. 

• The constraint restricts the values that these variables can simultaneously have. 

• For example, all-diff(Xl, X2, X3). This constraint says that Xl, X2, and X3 must 
take on different values. Say that {1,2,3} is the set of values for each of these 
variables then: 

Xl=l, X2=2, X3=3 OK Xl=l, X2=1,X3=3 NO 
The constraints are the key component in expressing a problem as a CSP. 

• The constraints are determined by how the variables and the set of values are chosen. 

• Each constraint consists of; 

1. A set of variables it is over. 

2. A specification of the sets of assignments to those variables that satisfy the 
constraint. 
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• The idea is that we break the problem up into a set of distinct conditions each of 
which have to be satisfied for the problem to be solved. 

Example: In N-Oueens: Place N queens on an NxN chess board so that no queen can attack 
any other queen. 

• No queen can attack any other queen. 

• Given any two queens Qi and Qj they cannot attack each other. 

• Now we translate each of these individual conditions into a separate constraint. 

o Qi cannot attack Qj(i ^j) 

• Qi is a queen to be placed in column i, Qj is a queen to be placed in 
column j. 

• The value of Qi and Qj are the rows the queens are to be placed in. 

• Note the translation is dependent on the representation we chose. 

• Queens can attack each other, 

1. Vertically, if they are in the same column—this is impossible as Qi and Qj 
are placed in different columns. 

2. Horizontally, if they are in the same row—we need the constraint Qi^Qj. 

3. Along a diagonal, they cannot be the same number of columns apart as they 
are rows apart: we need the constraint li-jj ^|Qi-Qjl (11 is absolute value) 

• Representing the Constraints; 

1. Between every pair of variables (Qi,Qj) (i ^j), we have a constraint Cij. 

2. For each Cij, an assignment of values to the variables Qi= A and Qj= B, 
satisfies this constraint if and only if; 

A^B 

I A-B| #-jl 

• Solutions: 

o A solution to the N-Queens problem will be any assignment of values to the 
variables Qi,.. .,Qn that satisfies all of the constraints, 
o Constraints can be over any collection of variables. In N-Queens we only need 
binary constraints—constraints over pairs of variables. 

More Examples: Map Coloring Problem 
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Searching 

A search problem 

Figure below contains a representation of a map. The nodes represent cities, and the links represent 
direct road connections between cities. The number associated to a link represents the length of 
the corresponding road. 

The search problem is to find a path from a city S to a city G 



Figure : A graph representation of a map 

This problem will be used to illustrate some search methods. 

Search problems are part of a large number of real world applications: 

VLSI layout 
Path planning 
Robot navigation etc. 

There are two broad classes of search methods: 

- uninformed (or blind) search methods; 

- heuristically informed search methods. 

In the case of the uninformed search methods, the order in which potential solution paths are 
considered is arbitrary, using no domain-specific information to judge where the solution is likely 
to lie. 

In the case of the heuristically informed search methods, one uses domain-dependent (heuristic) 
information in order to search the space more efficiently. 

Measuring problem Solving Performance 

We will evaluate the performance of a search algorithm in four ways 

• Completeness: An algorithm is said to be complete if it definitely finds solution to the 
problem, if exist. 
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• Time Complexity: How long (worst or average case) does it take to find a solution? 

Usually measured in terms of the number of nodes expanded 

• Space Complexity: How much space is used by the algorithm? Usually measured in 

terms of the maximum number of nodes in memory at a time 

• Optimality/Admissibility: If a solution is found, is it guaranteed to be an optimal one? 

For example, is it the one with minimum cost? 

Time and space complexity are measured in terms of 

b -- maximum branching factor (number of successor of any node) of the search tree 
m - depth of the least-cost solution 
d -- maximum length of any path in the space 

Breadth First Search 


All nodes are expended at a given depth in the search tree before any nodes at the next level 
are expanded until the goal reached. 

Expand shallowest unexpanded nod e. fringe is implemented as a FIFO queue 

Constraint: Do not generate as child node if the node is already parent to avoid more loop. 



Completeness: 

- Does it always find a solution if one exists ? 

- YES 

- If shallowest goal node is at some finite depth d and If b is finite 
Time complexity: 

- Assume a state space where every state has b successors. 

- root has b successors, each node at the next level has again b successors 
(total b 2 ),... 

- Assume solution is at depth of 

- Worst case; expand all except the last node at depth of 
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- Total no. of nodes generated: 

b + b 2 + b 3 +.b d + ( b d+1 -b) = 0(b d+1 ) 


Space complexity: 

- Each node that is generated must remain in memory 

- Total no. of nodes in memory: 

1+ b + b 2 + b 3 +.b d + ( b d+1 -b) = 0(b d+1 ) 

Optimal (i.e., admissible): 

- if all paths have the same cost. Otherwise, not optimal but finds solution with 
shortest path length (shallowest solution). If each path does not have same path 
cost shallowest solution may not be optimal 

Two lessons: 

- Memory requirements are a bigger problem than its execution time. 

- Exponential complexity search problems cannot be solved by uninformed search 


methods for any but the smallest instances. 


DEPTH2 

NODES 

TIME 

MEMORY 

2 

1100 

0.11 seconds 

1 megabyte 

4 

111100 

11 seconds 

106 megabytes 

6 

107 

19 minutes 

10 gigabytes 

8 

109 

31 hours 

1 terabyte 

10 

1011 

129 days 

101 terabytes 

12 

1013 

35 years 

10 petabytes 

14 

1015 

3523 years 

1 exabyte 


Depth First Search 

Looks for the goal node among all the children of the current node before using the sibling of this 
node i.e. expand deepest unexpanded node. 

Fringe is implemented as a LIFO queue (=stack) 



Shiv Raj Pant 











Artificial Intelligence 



Completeness; 

- Does it always find a solution if one exists ? 

- NO 

- If search space is infinite and search space contains loops then DFS may not 
find solution. 

Time complexity; 

- Let m is the maximum depth of the search tree. In the worst case Solution may 
exist at depth m. 

- root has b successors, each node at the next level has again b successors (total b 2 ), 

- Worst case; expand all except the last node at depth m 

- Total no. of nodes generated: 

b + b 2 + b 3 +.b m = 0(b m ) 

Space complexity: 

- It needs to store only a single path from the root node to a leaf node, along with 
remaining unexpanded sibling nodes for each node on the path. 

- Total no. of nodes in memory: 

1+ b + b + b +.b m times = O(bm) 

Optimal (i.e., admissible): 

- DFS expand deepest node first, if expands entire left sub-tree even if right sub-tree 
contains goal nodes at levels 2 or 3. Thus we can say DFS may not always give 
optimal solution. 


Uniform Cost Search: 
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Uniform-cost search (UCS) is modified version of BFS to make optimal. It is basically a 
tree search algorithm used for traversing or searching a weighted tree, tree structure, or 
graph. The search begins at the root node. The search continues by visiting the next node 
which has the least total cost from the root. Nodes are visited in this manner until a goal state 
is reached. 

Typically, the search algorithm involves expanding nodes by adding all unexpanded 
neighboring nodes that are connected by directed paths to a priority queue. In the queue, 
each node is associated with its total path cost from the root, where the least-cost paths are 
given highest priority. The node at the head of the queue is subsequently expanded, adding 
the next set of connected nodes with the total path cost from the root to the respective node. 
The uniform-cost search is complete and optimal if the cost of each step exceeds some 
positive bound s. 

Does not care about the number of steps, only care about total cost. 

•Complete? Yes, if step cost >s (small positive number). 

•Time? Maximum as of BFS 
•Space? Maximum as of BFS. 

•Optimal? Yes 

Consider an example: 
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°/ V i 

12 10 18 16 6 3 1 7 


A 


Start with 
root node 
A. 


Paths from root 
Q are generated. 







M N 


\ 


Since B has the least cost, 
we expand it. 



Of our 3 choices, C 
has the least cost so 
we’ll expand it. 


Node H has the least cost thus far, so we expand it. 



We have a goal, G2 but 
need to expand other 
branches to see if there is 
another goal with less 
distance. 


N 


H 

7 


G2 

21 



H 

7 


Note: Both 
nodes F and N 
have a cost of 
15, we chose to 
expand the 
leftmost node 
first. We 
continue 
expanding until 
all remaining 
paths are 
greater than 21, 
the cost of G2 


L 

21 


M N 
18 15 


G2 

21 


LQ 




36 
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Depth Limited Search: 

The problem of unbounded trees can be solve by supplying depth-first search with a determined 
depth limit (nodes at depth are treated as they have no successors) -Depth limited search. Depth- 
limited search is an algorithm to explore the vertices of a graph. It is a modification of depth-first 
search and is used for example in the iterative deepening depth-first search algorithm. 

Like the normal depth-first search, depth-limited search is an uninformed search. It works exactly 
like depth-first search, but avoids its drawbacks regarding completeness by imposing a maximum 
limit on the depth of the search. Even if the search could still expand a vertex beyond that depth, it 
will not do so and thereby it will not follow infinitely deep paths or get stuck in cycles. Therefore 
depth-limited search will find a solution if it is within the depth limit, which guarantees at least 
completeness on all graphs. 

It solves the infinite-path problem of DFS. Yet it introduces another source of problem if we are 
unable to find good guess of /. Let d is the depth of shallowest solution. 

If / < d then incompleteness results. 

If / > d then not optimal. 

Time complexity: 0( b') 

Space complexity: 0 ( b/) 


Iterative Deepening Depth First Search: 

In this strategy, depth-limited search is run repeatedly, increasing the depth limit with each 
iteration until it reaches d, the depth of the shallowest goal state. On each iteration, IDDFS visits 
the nodes in the search tree in the same order as depth-first search, but the cumulative order in 
which nodes are first visited, assuming no pruning, is effectively breadth-first. 

IDDFS combines depth-first search's space-efficiency and breadth-first search's completeness 
(when the branching factor is finite). It is optimal when the path cost is a non-decreasing function 
of the depth of the node. 

The technique of iterative deepening is based on this idea. Iterative deepening is depth-first search 
to a fixed depth in the tree being searched. If no solution is found up to this depth then the depth 
to be searched is increased and the whole 'bounded' depth-first search begun again. 

t works by setting a depth of search -say, depth 1- and doing depth-first search to that depth. If a 
solution is found then the process stops -otherwise, increase the depth by, say, 1 and repeat until 
a solution is found. Note that every time we start up a new bounded depth search we start from 
scratch - i.e. we throw away any results from the previous search. 

Now iterative deepening is a popular method of search. We explain why this is so. 
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Depth-first search can be implemented to be much cheaper than breadth-first search in terms 
of memory usage -but it is not guaranteed to find a solution even where one is guaranteed. 

On the other hand, breadth-first search can be guaranteed to terminate if there is a winning 
state to be found and will always find the 'quickest' solution (in terms of how many steps 
need to be taken from the root node). It is, however, a very expensive method in terms of 
memory usage. 

Iterative deepening is liked because it is an effective compromise between the two other 
methods of search. It is a form of depth-first search with a lower bound on how deep the 
search can go. Iterative deepening terminates if there is a solution. It can produce the same 
solution that breadth-first search would produce but does not require the same memory usage 
(as for breadth-first search). 

Note that depth-first search achieves its efficiency by generating the next node to explore 
only when this needed. The breadth-first search algorithm has to grow all the search paths 
available until a solution is found -and this takes up memory. Iterative deepening achieves 
its memory saving in the same way that depth-first search does -at the expense of redoing 
some computations again and again (a time cost rather than a memory one). In the search 
illustrated, we had to visit node d three times in all! 

• Complete (like BFS) 

• Has linear memory requirements (like DFS) 

• Classical time-space tradeoff. 

• This is the preferred method for large state spaces, where the solution path length is 
unknown. 

The overall idea goes as follows until the goal node is not found i.e. the depth limit is increased 
gradually. 



Iterative Deepening search evaluation: 
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Completeness: 

- YES (no infinite paths) 

Time complexity: 

- Algorithm seems costly due to repeated generation of certain states. 

- Nodegeneration: 

level d: once 
level d-1: 2 
level d-2: 3 

level 2: d-1 
level 1: d 

- Total no. of nodes generated: 

d.b +(d-l). b 2 + (d-2). b 3 +.+1. b d = 0(b d ) 

Space complexity: 

- It needs to store only a single path from the root node to a leaf node, along with 
remaining unexpanded sibling nodes for each node on the path. 

- Total no. of nodes in memory: 

1+ b + b + b +.b d times = O(bd) 


Optimality: 

- YES if path cost is non-decreasing function of the depth of the node. 


Notice that BFS generates some nodes at depth d+1, whereas IDS does not. The result is that 
IDS is actually faster than BFS, despite the repeated generation of node. 

Example: Number of nodes generated for b=10 and d=5 solution at far right 

N(IDS) = 50 + 400 + 3000 + 20000 + 100000 = 123450 

N(BFS) = 10 + 100 + 1000 + 10000 + 100000 + 999990 = 1111100 

Bidirectional Search: 

This is a search algorithm which replaces a single search graph, which is likely to with two smaller 
graphs -- one starting from the initial state and one starting from the goal state. It then, expands 
nodes from the start and goal state simultaneously. Check at each stage if the nodes of one have 
been generated by the other, i.e, they meet in the middle. If so, the path concatenation is the 
solution. 
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• Completeness: yes 

• Optimality: yes (If done with correct strategy- e.g. breadth first) 

• Time complexity: 0(b d/2 ) 

• Space complexity: 0(b d/2 ) 


Problems: generate predecessors; many goal states; efficient check for node already 
visited by other half of the search; and, what kind of search. 

Drawbacks of uniformed search : 


• Criterion to choose next node to expand depends only on a global criterion: level. 

• Does not exploit the structure of the problem. 

• One may prefer to use a more flexible rule that takes advantage of what is being 
discovered on the way, and hunches about what can be a good move. 

• Very often, we can select which rule to apply by comparing the current state and the 
desired state 


Heuristic Search: 

Heuristic Search Uses domain-dependent (heuristic) information in order to search the space more 
efficiently. 

Ways of using heuristic information: 

• Deciding which node to expand next, instead of doing the expansion in a strictly breadth-first or 
depth-first order; 

• In the course of expanding a node, deciding which successor or successors to generate, instead 
of blindly generating all possible successors at one time; 

• Deciding that certain nodes should be discarded, or pruned, from the search space. 
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Heuristic Searches - Why Use? 

• It may be too resource intensive (both time and space) to use a blind search 

• Even if a blind search will work we may want a more efficient search method 
Informed Search uses domain specific information to improve the search pattern 

- Define a heuristic function, h(n), that estimates the "goodness" of a node n. 

- Specifically, h(n) = estimated cost (or distance) of minimal cost path from n to a 
goal state. 

- The heuristic function is an estimate, based on domain-specific information that is 
computable from the current state description, of how close we are to a goal. 

Best-First Search 

Idea: use an evaluation function f(n) that gives an indication of which node to expand next 
for each node. 

- usually gives an estimate to the goal. 

- the node with the lowest value is expanded first. 

A key component of f(n) is a heuristic function, h(n), which is a additional knowledge of the 
problem. 

There is a whole family of best-first search strategies, each with a different evaluation function. 
Typically, strategies use estimates of the cost of reaching the goal and try to minimize it. 

Special cases: based on the evaluation function. 

- Greedy best-first search 

- A* search 

Greedy Best First Search 

The best-first search part of the name means that it uses an evaluation function to select which 
node is to be expanded next. The node with the lowest evaluation is selected for expansion because 
that is the best node, since it supposedly has the closest path to the goal (if the heuristic is good). 
Unlike A* which uses both the link costs and a heuristic of the cost to the goal, greedy best-first 
search uses only the heuristic, and not any link costs. A disadvantage of this approach is that if the 
heuristic is not accurate, it can go down paths with high link cost since there might be a low heuristic 
for the connecting node. 

Evaluation function/(/iJ = h(n) (heuristic) = estimate of cost from n to goal. 
e.g., hsLD(n) = straight-line distance from n to goal 

Greedy best-first search expands the node that appears to be closest to goal. The greedy best-first 
search algorithm is 0(b m ) in terms of space and time complexity. (Where b is the average branching 
factor (the average number of successors from a state), and m is the maximum depth of the search 
tree.) 

Example: Given following graph of cities, starting at Arad city, problem is to reach to the 
Bucharest. 
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Straight-line dist,uice 
to Due ha nest 

Xrad 

Due ha rest 

Craiova 


Dobreta 

Eforie 

Fagaras 

Giuraiu 

Hirsova 

Tasi 

h lehadia 
Neamt 
Oradea 
Phesti 

Rimnic u V ik#a 

Sihiu 
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Urziceni 
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3tto 
0 
ICO 
242 
161 
176 
77 
151 
226 
244 
241 
234 
230 
10 
192 
252 
229 
30 
199 
274 


Solution using greedy best first can be as below: 



366 










Artificial Intelligence 



A* Search : A Better Best-First Strategy 

Greedy Best-first search 

• minimizes estimated cost h(n) from current node n to goal; 

• is informed but (almost always) suboptimal and incomplete. 

Admissible Heuristic: 


A heuristic function is said to be admissible if it is no more than the lowest-cost path to the goal. In 
other words, a heuristic is admissible if it never overestimates the cost of reaching the goal. An 
admissible heuristic is also known as an optimistic heuristic. 

An admissible heuristic is used to estimate the cost of reaching the goal state in an informed 
search algorithm. In order for a heuristic to be admissible to the search problem, the 
estimated cost must always be lower than the actual cost of reaching the goal state. The 
search algorithm uses the admissible heuristic to find an estimated optimal path to the goal 
state from the current node. For example, in A* search the evaluation function (where n is 
the current node) is: f(n ) = gin) + h(n) 

where; 


f(n) = the evaluation function. 

g(n) = the cost from the start node to the current node 
h(n) = estimated cost from current node to goal. 

h(n) is calculated using the heuristic function. With a non-admissible heuristic, the A* 
algorithm would overlook the optimal solution to a search problem due to an overestimation 
inf(n). 

It is obvious that the SLD heuristic function is admissible as we can never find a shorter distance 
between any two towns. 
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Formulating admissible heuristics: 




n is a node 
h is a heuristic 

h(n) is cost indicated by h to reach a goal from n 
C(n) is the actual cost to reach a goal from n 
h is admissible if 


Vn, h(n) < C(n) 


For Example: 8-puzzle 

Figure shows 8-puzzle start state and goal state. The 
solution is 26 steps long. 



hi(n) = number of misplaced tiles Start State Can 1 5rn t* 

h 2 (n) = sum of the distance of the tiles from their goal position (notdiagonal). 
hi(S) = ? 8 

h 2 (S) = ? 3+1+2+2+2+3+3+2 = 18 
h n (S) = max{hl(S), h2(S)}= 18 


Consistency (Monotonicitv) 

A heuristic is said to be consistent if for any node N and any successor N' of N , estimated cost to 
reach to the goal from node N is less than the sum of step cost from N to N' and estimated cost 
from node N' to goal node. 

i.e h(n) < c(n, n') + h(n') 

Where; 


h(n) = Estimated cost to reach to the goal node from node n 
c(n, n') = actual cost from n to n' 

A* Search: 


A* is a best first, informed graph search algorithm. A* is different from other best first search 
algorithms in that it uses a heuristic function h(x) as well as the path cost to the node g(x), in 
computing the cost f(x) = h(x) + g(x) for the node. The h(x) part of the/(x) function must be an 
admissible heuristic; that is, it must not overestimate the distance to the goal. Thus, for an 
application like routing, h(x ) might represent the straight-line distance to the goal, since that 
is physically the smallest possible distance between any two points or nodes. 

It finds a minimal cost-path joining the start node and a goal node for node n. 
Evaluation function: f(n) = g(n) + h(n) 
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Where, 


g(n) = cost so far to reach n from root 

h(n) = estimated cost to goal from n 

f(n) = estimated total cost of path through n to goal 

• combines the two by minimizing f(n) = g(n) + h(n); 

• is informed and, under reasonable assumptions, optimal and complete. 

As A* traverses the graph, it follows a path of the lowest known path, keeping a sorted priority 
queue of alternate path segments along the way. If, at any point, a segment of the path being 
traversed has a higher cost than another encountered path segment, it abandons the higher-cost 
path segment and traverses the lower-cost path segment instead. This process continues until the 
goal is reached. 


A* Search Example 



366=0+366 





393=140+253 


447=118+329 


449=75+374 





447=118+329 


449=75+374 


646=280+366 415=239+176 671=291+380 413=220+193 
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_ Zerind 
449 = 75+374 


Admissibility and Optimality: 

A* is admissible and considers fewer nodes than any other admissible search algorithm with 
the same heuristic. This is because A* uses an "optimistic" estimate of the cost of a path 
through every node that it considers—optimistic in that the true cost of a path through that 
node to the goal will be at least as great as the estimate. But, critically, as far as A* "knows", 
that optimistic estimate might be achievable. 



Q 

X 

46 

_ / 
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Here is the main idea of the proof: 

When A* terminates its search, it has found a path whose actual cost is lower than the 
estimated cost of any path through any open node. But since those estimates are optimistic, 
A* can safely ignore those nodes. In other words, A* will never overlook the possibility of 
a lower-cost path and so is admissible. 


Suppose, now that some other search algorithm B terminates its search with a path whose 
actual cost is not less than the estimated cost of a path through some open node. Based on 
the heuristic information it has, Algorithm B cannot rule out the possibility that a path 
through that node has a lower cost. So while B might consider fewer nodes than A*, it cannot 
be admissible. Accordingly, A* considers the fewest nodes of any admissible search 
algorithm. 

This is only true if both: 

• A* uses an admissible heuristic. Otherwise, A* is not guaranteed to expand fewer nodes 
than another search algorithm with the same heuristic. 

• A* solves only one search problem rather than a series of similar search problems. 
Otherwise, A* is not guaranteed to expand fewer nodes than incremental heuristic search 
algorithms 

Thus, if estimated distance h(n) never exceed the true distance h*(n) between the current 
node to goal node, the A* algorithm will always find a shortest path -This is known as the 
admissibility of A* algorithm and h(n) is a admissible heuristic. 

IF 0 =< h (n) =< h*(n), and costs of all arcs are positive 

THEN A* is guaranteed to find a solution path of minimal cost if any solution path exists. 


Theorem: A* is optimal ifh(n) is admissible. Start 

Suppose suboptimal goal G2 in the queue. 

Let n be an unexpanded node on a shortest path to optimal goal 
G and C* be the cost of optimal goal node. 

f(G2 ) = h (G2) + g(G2 ) 

f(G2 ) =g(G2), since h(G2)=0 

f(G2 ) >C* .(1) 

Again, since h(n) is admissible. It does not overestimates the cost of completing the solution path. 

f(n) = g(n) + h(n) <C* .(2) 

Now from (1) and (2) f(n) < C* < f(G2) 
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Since f(G2) > f(n), A* will never select G2 for expansion. Thus A* gives us optimal solution when 
heuristic function is admissible. 

Theorem: If h(n) is consistent, then the values of f(n) along the path are non-decreasing. 

Suppose n' is successor of n, then 

g(n') = g(n) + C(n, a, n') 

We know that, 

f(n') = g(n') + h(n') 

f(n') = g(n) + C(n, a, n') + h(n').(1) 

A heuristic is consistent if 

h(n) < C(n, a, n') + h(n').(2) 

Now from (1) and (2) 

f(n') = g(n) + C(n, a, n') + h(n') > g(n) + h(n) = f(n) 
f(n') > f(n) 

f(n) is non-decreasing along any path. 

One more example: Maze Traversal (for A* Search) 

Problem: To get from square A3 to square E2, one step at a time, avoiding obstacles (black squares). 
Operators: (in order) 

• go_left(n) 

• go_down(n) 

• go_right(n) 

Each operator costs 1. 

Heuristic: Manhattan distance 
Start Position: A3 
Goal: E2 



1 2 3 4 5 



Cp 


Q 
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AA 



Hill Climbing Search: 

Hill climbing can be used to solve problems that have many solutions, some of which are 
better than others. It starts with a random (potentially poor) solution, and iteratively 
makes small changes to the solution, each time improving it a little. When the algorithm 
cannot see any improvement anymore, it terminates. Ideally, at that point the current 
solution is close to optimal, but it is not guaranteed that hill climbing will ever come close 
to the optimal solution. 

For example, hill climbing can be applied to the traveling salesman problem. It is easy to 
find a solution that visits all the cities but will be very poor compared to the optimal solution. 
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The algorithm starts with such a solution and makes small improvements to it, such as 
switching the order in which two cities are visited. Eventually, a much better route is 
obtained. In hill climbing the basic idea is to always head towards a state which is better than 
the current one. So, if you are at town A and you can get to town B and town C (and your 
target is town D) then you should make a move IF town B or C appear nearer to town D than 
town A does. 

The hill climbing can be described as follows: 

1. Start with current-state = initial-state. 

2. Until current-state = goal-state OR there is no change in current-state do: 

• Get the successors of the current state and use the evaluation function to assign a 
score to each successor. 

• If one of the successors has a better score than the current-state then set the new 
current-state to be the successor with the best score. 

Hill climbing terminates when there are no successors of the current state which are better than 
the current state itself. 

Hill climbing is depth-first search with a heuristic measurement that orders choices as nodes are 
expanded. It always selects the most promising successor of the node last expanded. 

For instance, consider that the most promising successor of a node is the one that has the shortest 
straight-line distance to the goal node G. In figure below, the straight line distances between each 
city and goal G is indicated in square brackets, i.e. the heuristic. 



The hill climbing search from S to G proceeds as follows: 
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Apply the hill climbing algorithm to find a path from S to G, considering that the most promising 
successor of a node is its closest neighbor. 



Note: 

The difference between the hill climbing search method and the best first search method is the 
following one: 

• the best first search method selects for expansion the most promising leaf node of the 
current search tree; 

• the hill climbing search method selects for expansion the most promising successor of the 
node last expanded. 


Problems with Hill Climbing 

- Gets stuck at local maxima when we reach a position where there are no better 
neighbors, it is not a guarantee that we have found the best solution. Ridge is a 
sequence of local maxima. 

— Another type of problem we may find with hill climbing searches is finding a 
plateau. This is an area where the search space is flat so that all neighbors return 
the same evaluation 



Shiv Raj Pant 









Artificial Intelligence 


Simulated Annealing: 

It is motivated by the physical annealing process in which material is heated and slowly 
cooled into a uniform structure. Compared to hill climbing the main difference is that SA 
allows downwards steps. Simulated annealing also differs from hill climbing in that a move 
is selected at random and then decides whether to accept it. If the move is better than its 
current position then simulated annealing will always take it. If the move is worse (i.e. lesser 
quality) then it will be accepted based on some probability. The probability of accepting a 
worse state is given by the equation 

P = exponential(-c It) > r 

Where 

c = the change in the evaluation function 

t = the current value 

r = a random number between 0 and 1 

The probability of accepting a worse state is a function of both the current value and the change in 
the cost function. The most common way of implementing an SA algorithm is to implement hill 
climbing with an accept function and modify it for SA 

By analogy with this physical process, each step of the SA algorithm replaces the current solution 
by a random "nearby" solution, chosen with a probability that depends on the difference between 
the corresponding function values and on a global parameter T (called the temperature), that is 
gradually decreased during the process. The dependency is such that the current solution changes 
almost randomly when 7" is large, but increasingly "downhill" as T goes to zero. The allowance for 
"uphill" moves saves the method from becoming stuck at local optima—which are the bane of 
greedier methods. 

Game Search: 


Games are a form of multi-agent environment 

- What do other agents do and how do they affect our success? 

- Cooperative vs. competitive multi-agent environments. 

- Competitive multi-agent environments give rise to adversarial search often known as 
games 

Games - adversary 

- Solution is strategy (strategy specifies move for every possible opponent reply). 

- Time limits force an approximate solution 

- Evaluation function: evaluate "goodness" of game position 

- Examples: chess, checkers, Othello, backgammon 

Difference between the search space of a game and the search space of a problem: In the first case 
it represents the moves of two (or more) players, whereas in the latter case it represents the 
"moves" of a single problem-solving agent. 
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An exemplary game: Tic-tac-toe 

There are two players denoted by X and 0. They are alternatively writing their letter in one of the 
9 cells of a 3 by 3 board. The winner is the one who succeeds in writing three letters in line. 

The game begins with an empty board. It ends in a win for one player and a loss for the other, or 
possibly in a draw. 

A complete tree is a representation of all the possible plays of the game. The root node is the initial 
state, in which it is the first player's turn to move (the player X). 

The successors of the initial state are the states the player can reach in one move, their successors 
are the states resulting from the other player's possible replies, and so on. 

Terminal states are those representing a win for X, loss for X, or a draw. 

Each path from the root node to a terminal node gives a different complete play of the game. Figure 
given below shows the initial search space of Tic-Tac-Toe. 


MAX {X) 


MIN (0) 


MAX (X) 


MIN (O) 



TERMINAL 

Utility 



-1 0 +1 


Fig: Partial game tree for Tic-Tac-Toe 


A game can be formally defined as a kind of search problem as below: 


• Initial state: It includes the board position and identifies the playesr to move. 


fn 


Q 
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• Successor function: It gives a list of (move, state) pairs each indicating a legal move and 
resulting state. 

• Terminal test: This determines when the game is over. States where the game is ended are 
called terminal states. 

• Utility function: It gives numerical value of terminal states. E.g. win (+1), loose (-1) and draw 
(0). Some games have a wider variety of possible outcomes eg. ranging from +92 to -192. 


The Minimax Algorithm: 

Let us assign the following values for the game: 1 for win by X, 0 for draw, -1 for loss by X. 

Given the values of the terminal nodes (win for X (1), loss for X (-1), or draw (0)), the values of the 
non-terminal nodes are computed as follows: 

• the value of a node where it is the turn of player X to move is the maximum of the values 
of its successors (because X tries to maximize its outcome); 

• the value of a node where it is the turn of player 0 to move is the minimum of the values 
of its successors (because 0 tries to minimize the outcome of X). 

Figure below shows how the values of the nodes of the search tree are computed from the values 
of the leaves of the tree. The values of the leaves of the tree are given by the rules of the game: 


• 1 if there are three X in a row, column or diagonal; 

• -1 if there are three 0 in a row, column or diagonal; 


0 otherwise 
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An Example: 

Consider the following game tree (drawn from the point of view of the Maximizing player): 



Show what moves should be chosen by the two players, assuming that both are using the mini-max 
procedure. 

Solution: 



53314 7592 7 


Figure 3.16: The mini-max path for the game tree 


Alpha-Beta Pruning: 

The problem with minimax search is that the number if game states it has examine is exponential 
in the number of moves. Unfortunately, we can't eliminate the exponent, but we can effectively 
cut it in half. The idea is to compute the correct minimax decision without looking at every node in 
the game tree, which is the concept behind pruning. Here idea is to eliminate large parts of the tree 
from consideration. The particular technique for pruning that we will discuss here is "Alpha-Beta 
Pruning". When this approach is applied to a standard minimax tree, it returns the same move as 
minimax would, but prunes away branches that cannot possibly influence the final decision. Alpha- 
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beta pruning can be applied to trees of any depth, and it is often possible to prune entire sub-trees 
rather than just leaves. 

Alpha-beta pruning is a technique for evaluating nodes of a game tree that eliminates unnecessary 
evaluations. It uses two parameters, alpha and beta. 

Alpha: is the value of the best (i.e. highest value) choice we have found so far at any choice point 
along the path for MAX. 

Beta: is the value of the best (i.e. lowest-value) choice we have found so far at any choice point 
along the path for MIN. 

Alpha-beta search updates the values of alpha and beta as it goes along and prunes the remaining 
branches at a node as soon as the value of the current node is known to be worse than the current 
alpha or beta for MAX or MIN respectively. 

An alpha cutoff: 

To apply this technique, one uses a parameter called alpha that represents a lower bound for the 
achievement of the Max player at a given node. 

Let us consider that the current board situation corresponds to the node A in the following figure. 



The minimax method uses a depth-first search strategy in evaluating the descendants of a node. It 
will therefore estimate first the value of the node B. Let us suppose that this value has been 
evaluated to 15, either by using a static evaluation function, or by backing up from descendants 
omitted in the figure. If Max will move to B then it is guaranteed to achieve 15. Therefore 15 is a 
lower bound for the achievement of the Max player (it may still be possible to achieve more, 
depending on the values of the other descendants of A). Therefore, the value of a at node B is 15. 
This value is transmitted upward to the node A and will be used for evaluating the other possible 
moves from A. 
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To evaluate the node C, its left-most child D has to be evaluated first. Let us assume that the value 
of D is 10 (this value has been obtained either by applying a static evaluation function directly to D, 
or by backing up values from descendants omitted in the figure). Because this value is less than the 
value of a, the best move for Max is to node B, independent of the value of node E that need not 
be evaluated. Indeed, if the value of E is greater than 10, Min will move to D which has the value 
10 for Max. Otherwise, if the value of E is less than 10, Min will move to E which has a value less 
than 10. So, if Max moves to C, the best it can get is 10, which is less than the value a = 15 that 
would be gotten if Max would move to B. Therefore, the best move for Max is to B, independent of 
the value of E. The elimination of the node E is an alpha cutoff. 

One should notice that E may itself have a huge subtree. Therefore, the elimination of E means, in 
fact, the elimination of this subtree. 

A beta cutoff: 

To apply this technique, one uses a parameter called beta that represents an upper bound for the 
achievement of the Max player at a given node. 

In the above tree, the Max player moved to the node B. Now it is the turn of the Min player to 
decide where to move: 



Figure 3.18: Illustration of the beta cut-off. 

The Min player also evaluates its descendants in a depth-first order. 
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Let us assume that the value of F has been evaluated to 15. From the point of view of Min, this is 
an upper bound for the achievement of Min (it may still be possible to make Min achieve less, 
depending of the values of the other descendants of B). Therefore the value of p at the node F is 
15. This value is transmitted upward to the node B and will be used for evaluating the other possible 
moves from B. 

To evaluate the node G, its left-most child FI is evaluated first. Let us assume that the value of H is 
25 (this value has been obtained either by applying a static evaluation function directly to H, or by 
backing up values from descendants omitted in the figure). Because this value is greater than the 
value of p, the best move for Min is to node F, independent of the value of node I that need not be 
evaluated. Indeed, if the value of I is v > 25, then Max (in G) will move to I. Otherwise, if the value 
of I is less than 25, Max will move to H. So in both cases, the value obtained by Max is at least 25 
which is greater than p (the best value obtained by Max if Min moves to F). 

Therefore, the best move for Min is at F, independent of the value of I. The elimination of the node 
I is a beta cutoff. 

One should notice that by applying alpha and beta cut-off, one obtains the same results as in the 
case of mini-max, but (in general) with less effort. This means that, in a given amount of time, one 
could search deeper in the game tree than in the case of mini-max. 
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Knowledge Representation 

Knowledge: 

Knowledge is a theoretical or practical understanding of a subject or a domain. Knowledge is also 
the sum of what is currently known. 

Knowledge is "the sum of what is known: the body of truth, information, and principles acquired 
by mankind." Or, "Knowledge is what I know, Information is what we know." 

There are many other definitions such as: 

- Knowledge is "information combined with experience, context, interpretation, and reflection. It is 
a high-value form of information that is ready to apply to decisions and actions." (T. Davenport et 
al., 1998) 

- Knowledge is "human expertise stored in a person's mind, gained through experience, and 
interaction with the person's environment." (Sunasee and Sewery, 2002) 

- Knowledge is "information evaluated and organized by the human mind so that it can be used 
purposefully, e.g., conclusions or explanations." (Rousa, 2002) 

Knowledge consists of information that has been: 

- interpreted, 

- categorised, 

- applied, experienced and revised. 

In general, knowledge is more than just data, it consist of: facts, ideas, beliefs, heuristics, 
associations, rules, abstractions, relationships, customs. 

Research literature classifies knowledge as follows: 


Classification-based Knowledge » 

Decision-oriented Knowledge » 

Descriptive knowledge » 

Procedural knowledge » 

Reasoning knowledge » 

Assimilative knowledge » 


Ability to classify information 
Choosing the best option 
State of some world (heuristic) 

How to do something 

What conclusion is valid in what situation? 

What its impact is? 
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Knowledge Representation 

Knowledge representation (KR) is the study of how knowledge about the world can be represented 
and what kinds of reasoning can be done with that knowledge. Knowledge Representation is the 
method used to encode knowledge in Intelligent Systems. 

Since knowledge is used to achieve intelligent behavior, the fundamental goal of knowledge 
representation is to represent knowledge in a manner as to facilitate inferencing (i.e. drawing 
conclusions) from knowledge. A successful representation of some knowledge must, then, 
be in a form that is understandable by humans, and must cause the system using the 
knowledge to behave as if it knows it. 

Some issues that arise in knowledge representation from an AI perspective are: 

• How do people represent knowledge? 

• What is the nature of knowledge and how do we represent it? 

• Should a representation scheme deal with a particular domain or should it be general 
purpose? 

• How expressive is a representation scheme or formal language? 

• Should the scheme be declarative or procedural? 


REASONING 

PROGRAMS 



Fig: Two entities in Knowledge Representaion 

For example: English or natural language is an obvious way of representing and handling 
facts. Logic enables us to consider the following fact: spot is a dog as dog(spot) We could 
then infer that all dogs have tails with: Vs: dog(x) —diasatail(x) We can then deduce: 

hasatail(Spot) 

Using an appropriate backward mapping function the English sentence Spot has a tail can 
be generated. 

Properties for Knowledge Representation Systems 
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The following properties should be possessed by a knowledge representation system. 

Representational Adequacy 

the ability to represent the required knowledge; 

Inferential Adequacy 

the ability to manipulate the knowledge represented to produce new knowledge 
corresponding to that inferred from the original; 

Inferential Efficiency 

the ability to direct the inferential mechanisms into the most productive directions 
by storing appropriate guides; 

Acquisitional Efficiency 

the ability to acquire new knowledge using automatic methods wherever possible 
rather than reliance on human intervention. 


Formal logic-connectives: 

In logic, a logical connective (also called a logical operator) is a symbol or word used to 
connect two or more sentences (of either a formal or a natural language) in a grammatically 
valid way, such that the compound sentence produced has a truth value dependent on the 
respective truth values of the original sentences. 

Each logical connective can be expressed as a function, called a truth function. For this 
reason, logical connectives are sometimes called truth-functional connectives. 

Commonly used logical connectives include: 

• Negation (not) (- or ~) 

• Conjunction (and) (A, &, or •) 

• Disjunction (or) (V or v ) 

• Material implication (if...then) (—►, =^or D) 

• Biconditional (if and only if) (iff) (xnor) (-t—►, =, or = ) 

For example, the meaning of the statements it is raining and I am indoors is transformed 
when the two are combined with logical connectives: 

• It is raining and I am indoors (P Aq) 

• If it is raining, then I am indoors (P — >Q) 

• It is raining if I am indoors (Q—*P) 

• It is raining if and only if I am indoors (P *-Q) 

• It is not raining (-P) 



Shiv Raj Pant 







Artificial Intelligence 


For statement P = It is raining and Q = I am indoors. 

Truth Table: 


A proposition in general contains a number of variables. For example (P Vq) contains variables P 
and Q. each of which represents an arbitrary proposition. Thus a proposition takes different values 
depending on the values of the constituent variables. This relationship of the value of a proposition 
and those of its constituent variables can be represented by a table. It tabulates the value of a 
proposition for all possible values of its variables and it is called a truth table. 

For example the following table shows the relationship between the values of P, Q. and P VQ: 


OR 

P 

Q 

(p Vq) 

F 

F 

F 

F 

T 

T 

T 

F 

T 

T 

T 

T 


Logic : 


Logic is a formal language for representing knowledge such that conclusions can be drawn. Logic 
makes statements about the world which are true (or false) if the state of affairs it represents is the 
case (or not the case). Compared to natural languages (expressive but context sensitive) and 
programming languages (good for concrete data structures but not expressive) logic combines the 
advantages of natural languages and formal languages. Logic is concise, unambiguous, expressive, 
context insensitive, effective for inferences. 


It has syntax, semantics, and proof theory. 

Syntax: Describe possible configurations that constitute sentences. 

Semantics: Determines what fact in the world, the sentence refers to i.e. the interpretation. Each 
sentence make claim about the world (meaning of sentence).Semantic property include truth and 
falsity. 

Syntax is concerned with the rules used for constructing, or transforming the symbols and words of 
a language, as contrasted with the semantics of a language which is concerned with its meaning. 


Proof theory (Inference method): set of rules for generating new sentences that are necessarily 
true given that the old sentences are true. 
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We will consider two kinds of logic: propositional logic and first-order logic or more precisely first- 
order predicate calculus. Propositional logic is of limited expressiveness but is useful to introduce 
many of the concepts of logic's syntax, semantics and inference procedures. 

Entailment: 


Entailment means that one thing follows from another: 

KB |= a 

Knowledge base KB entails sentence a if and only if a is true in all worlds where KB is true 
E.g., x + y =4 entails 4=x + y 


Entailment is a relationship between sentences (i.e., syntax) that is based on semantics. 


We can determine whether S |= P by finding Truth Table for Sand P, if any row of Truth Table where 
all formulae in S is true. 


Example: 


p 

p^o 

Q 

True 

True 

True 

True 

False 

False 

False 

True 

True 

False 

True 

False 


Therefore (P, P->Q} |= Q. Here, only row where both P and P-^Q are True, Q is also True. Here, 

S= (P, P->Q} and P= {Q}. 

Models 

Logicians typically think in terms of models, in place of "possible world", which are formally 
structured worlds with respect to which truth can be evaluated. 

m is a model of a sentence a if a is true in m. 

M(a) is the set of all models of a. 

Tautology: 

A formula of propositional logic is a tautology if the formula itself is always true 
regardless of which valuation is used for the propositional variables. 

There are infinitely many tautologies. Examples include: 

• (-^ V )("/\ or not A"), the law of the excluded middle. This formula has only one 

propositional variable, A. Any valuation for this formula must, by definition, assign A one of 
the truth values true or false, and assign —iA the other truth value. 
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• — * ^ t -1 -^ -|J ^)("if A implies B then not-B implies not-A", and vice 

versa), which expresses the law of contraposition. 

• (( J 4 —► B) A (B —► C)) —> (A —*■ C)(«tf A j m p|j es e and B implies C, then A 
implies C"), which is the principle known as syllogism. 

The definition of tautology can be extended to sentences in predicate logic, which may contain 
quantifiers, unlike sentences of propositional logic. In propositional logic, there is no distinction 
between a tautology and a logically valid formula. In the context of predicate logic, many authors 
define a tautology to be a sentence that can be obtained by taking a tautology of propositional logic 
and uniformly replacing each propositional variable by a first-order formula (one formula per 
propositional variable). The set of such formulas is a proper subset of the set of logically valid 
sentences of predicate logic (which are the sentences that are true in every model). 

There are also propositions that are always false such as (P A—iP). Such a proposition is called a 

contradiction. 

A proposition that is neither a tautology nor a contradiction is called a contingency. 
For example (P Vq.) is a contingency. 

Validity: 

The term validity in logic (also logical validity) is largely synonymous with logical truth, however 
the term is used in different contexts. Validity is a property of formulae, statements and arguments. 

A logically valid argument is one where the conclusion follows from the premises. An invalid 
argument is where the conclusion does not follow from the premises. A formula of a formal 
language is a valid formula if and only if it is true under every possible interpretation of the 
language. 

Saying that an argument is valid is equivalent to saying that it is logically impossible that 
the premises of the argument are true and the conclusion false. A less precise but intuitively 
clear way of putting this is to say that in a valid argument IF the premises are true, then the 
conclusion must be true. 

An argument that is not valid is said to be “invalid”. 

An example of a valid argument is given by the following well-known syllogism: 

All men are mortal. 

Socrates is a man. 

Therefore, Socrates is mortal. 

What makes this a valid argument is not that it has true premises and a true conclusion, but 
the logical necessity of the conclusion, given the two premises. 
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The following argument is of the same logical form but with false premises and a false conclusion, 
and it is equally valid: 

All women are cats. 

All cats are men. 

Therefore, all women are men. 

This argument has false premises and a false conclusion. This brings out the hypothetical character 
of validity. What the validity of these arguments amounts to, is that it assures us the conclusion 
must be true IF the premises are true. 

Thus, an argument is valid if the premises and conclusion follow a logical form. This essentially 
means that the conclusion logically follows from the premises. An argument is valid if and only if 
the truth of its premises entails the truth of its conclusion. It would be self-contradictory to affirm 
the premises and deny the conclusion 

Deductive Reasoning: 

Deductive reasoning, also called Deductive logic, is reasoning which constructs or 
evaluates deductive arguments. Deductive arguments are attempts to show that a conclusion 
necessarily follows from a set of premises. A deductive argument is valid if the conclusion 
does follow necessarily from the premises, i.e., if the conclusion must be true provided 
that the premises are true. A deductive argument is sound if it is valid AND its premises 
are true. Deductive arguments are valid or invalid, sound or unsound, but are never false or 
true. 

An example of a deductive argument: 

1. All men are mortal 

2. Socrates is a man 

3. Therefore, Socrates is mortal 

The first premise states that all objects classified as 'men' have the attribute 'mortal'. The 
second premise states that 'Socrates' is classified as a man- a member of the set 'men'. The 
conclusion states that 'Socrates' must be mortal because he inherits this attribute from his 
classification as a man. 

Deductive arguments are generally evaluated in terms of their validity and soundness. An 
argument is valid if it is impossible both for its premises to be true and its conclusion to be 
false. An argument can be valid even though the premises are false. 

This is an example of a valid argument. The first premise is false, yet the conclusion is still 
valid. 


All fire-breathing rabbits live on Mars 
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All humans are fire-breathing rabbits 
Therefore, all humans live on Mars 

This argument is valid but not sound In order for a deductive argument to be sound, the 
deduction must be valid and the premise must all be true. 

Let’s take one of the above examples. 

1. All monkeys are primates 

2. All primates are mammals 

3. All monkeys are mammals 

This is a sound argument because it is actually true in the real world. The premises are true 
and so is the conclusion. They logically follow from one another to form a concrete argument 
that can’t be denied. Where validity doesn’t have to do with the actual truthfulness of an 
argument, soundness does. 

A theory of deductive reasoning known as categorical or term logic was developed by 
Aristotle, but was superseded by propositional (sentential) logic and predicate logic. 

Deductive reasoning can be contrasted with inductive reasoning. In cases of inductive 
reasoning, it is possible for the conclusion to be false even though the premises are true and 
the argument's form is cogent. 

Well Formed Formula: (wff) 

It is a syntactic object that can be given a semantic meaning. A formal language can be considered 
to be identical to the set containing all and only its wffs. 

A key use of wffs is in propositional logic and predicate logics such as first-order logic. In 
those contexts, a formula is a string of symbols tp for which it makes sense to ask "is tp true?", 
once any free variables in tp have been instantiated. In formal logic, proofs can be represented 
by sequences of wffs with certain properties, and the final wff in the sequence is what is 
proven. 

The well-formed formulas of propositional calculus are expressions such as 



Their definition begins with the arbitrary choice of a set V of 


propositional variables. The alphabet consists of the letters in V along with the symbols for 
the propositional connectives and parentheses "(" and ")", all of which are assumed to not be 
in V. The wffs will be certain expressions (that is, strings of symbols) over this alphabet. 

The well-formed formulas are inductively defined as follows: 


Each propositional variable is, on its own, a wff. 
If 4> is a wff, then —i<t> is a wff. 
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• If cJj and i|j are wffs, and • is any binary connective, then (c|> • ip) is a wff. Here • could be 
V, A, A, or <->. 

The WFF for predicate calculus is defined to be the smallest set containing the set of atomic 
WFFs such that the following holds: 

1. _,< / ? is a WFF when 0is a WFF 

2 ((p A lp f ) anc j (<P V ^ f ) are WFFs when 0and Vare WFFs; 

3. $is a WFF when x is a variable and 0is a WFF; 

4. v ^ is a WFF when d^is a variable and $is a WFF (alternatively, v ^ could be 
defined as an abbreviation for '3.t- 

If a formula has no occurrences of 3;TorV;c, for any variable X, then it is called 
quantifier-free. An existential formula is a string of existential quantification followed by a 
quantifier-free formula. 

Propositional Logic: 

Propositional logic represents knowledge/ information in terms of propositions. Prepositions are 
facts and non-facts that can be true or false. Propositions are expressed using ordinary declarative 
sentences. Propositional logic is the simplest logic. 

Syntax: 

The syntax of propositional logic defines the allowable sentences. The atomic sentences- the 
indivisible syntactic elements- consist of single proposition symbol. Each such symbol stands for a 
proposition that can be true or false. We use the symbols like PI, P2 to represent sentences. 

The complex sentences are constructed from simpler sentences using logical connectives. There 
are five connectives in common use: 

-i (negation), A (conjunction), v (disjunction), => (implication), <=> (biconditional) 

The order of precedence in propositional logic is from (highest to lowest): —,, A , v, =>, <=>. 
Propositional logic is defined as: 

If S is a sentence, -.S is a sentence (negation) 

If SI and S2 are sentences, SI A S2 is a sentence (conjunction) 

If SI and S2 are sentences, SI v S2 is a sentence ( disjunction) 

If SI and S2 are sentences, SI => S2 is a sentence ( implication) 

If SI and S2 are sentences, SI <=> S2 is a sentence (biconditional) 



Shiv Raj Pant 






Artificial Intelligence 


Formal grammar for propositional logic can be given as below: 


Sentence 
AutomicSentence 

Symbol —> P | Q. | R. 

ComplexSentence —> -.Sentence 


AutomicSentence | ComplexSentence 
True | False | Symbol 


| (Sentence A Sentence) 

|(Sentence v Sentence) 

| (Sentence => Sentence) 
| (SentenceSentence) 


Semantics: 

Each model specifies true/false for each proposition symbol 
Rules for evaluating truth with respect to a model: 

-iS is true if, S is false 
SI A S2 is true if, SI is true and S2 is true 
SI v S2 is true if, SI is true or S2 is true 
SI => S2 is true if, SI is false or S2 is true 
SI <=> S2 is true if, SI => S2 is true and S2 => SI is true 
Truth Table showing the evaluation of semantics of complex sentences: 


p 

Q 

— iP 

PaQ 

PvQ 

P=>Q 

P«Q 

false 

false 

true 

false 

false 

true 

true 

false 

true 

true 

false 

true 

true 

false 

true 

false 

false 

false 

true 

false 

false 

true 

true 

false 

true 

true 

true 

true 


Logical equivalence: 
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Two sentences a and R are logically equivalent (a= (?) iff true they are true inn same set of models 
or Two sentences a and R are logically equivalent (a = (?) iff a | = IS and R | = a. 


(aAp) 
(aV/3) 
((a A p) A 7) 
((a V p) V 7) 
-■(--a) 
(a => p) 
(a =* P) 
(a P) 
->(a A P) 
->(a V P) 
(a A {p V 7)) 
(a V (p A 7)) 


(P A a) commutativity of A 

(P V a) commutativity of V 

(a A (P A 7)) associativity of A 

(a V (P V 7)) associativity of V 

a double-negation elimination 

(—ip => —>a) contraposition 

(-«a V / 3 ) implication elimination 

((a =>- /?) A (/3 a)) biconditional elimination 

(->a V -ip) de Morgan 

(-«a A ->P) de Morgan 

((a A P) V (a A 7)) distributivity of A over V 
((a V P) A (a V 7)) distributivity of V over A 


Validity: 

A sentence is valid if it is true in all models, 


e.g., True, Av—iA, A => A, (A a (A => B)) => B 

Valid sentences are also known as tautologies. Every valid sentence is logically equivalent to True 

Satisfiability: 

A sentence is satisfiable if it is true in some model 


- e.g., AvB, C 

A sentence is unsatisfiable if it is true in no models 


- e.g., A-iaA 

Validity and satisfiablity are related concepts 

- a is valid iff -.a is unsatisfiable 

- a is satisfiable iff -.a is not valid 
Satisfiability is connected to inference via the following: 

- KB 1= a if and only if (KB a — itr ) is unsatisfiable 


Inference rules in Propositional Logic 

Inference rules are the standard patterns of inference that can be applied to derive conclusions 
from given facts. 

Modus Ponens 
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a => p,a 

P 

Arid-elimination 

ar\ p 
a 

Monotonicity: the set of entailed sentences can only increase as information is added to the 
knowledge base. 

For any sentence a and P if KB | = a then KB a p | = a. 

Resolution 

Unit resolution rule: 

Unit resolution rule takes a clause - a disjunction of literals - and a literal and produces a new 
clause. Single literal is also called unit clause. 

4 V ■ ■ ■ V 4, m 

4 v ■ ■ ■ v J~ 1 v 4+ 1 v ■ ■ ■ v 4 

Where f and m are complementary literals 
Generalized, resolution rule: 

Generalized resolution rule takes two clauses of any length and produces a new clause as below. 

4 V ■ ■ ■ V 4, mi V ■ ■ ■ V m n 

fl V ■ • ■ V 4_1 V 4+1 V • ■ • V 4 V mi V • ■ • V 777j_i V m j+1 V • ■ • V m n 

For example: 

4 v4> -*4 v 4 

4 v 4 


Resolution Uses CNF (Conjunctive normal form) 

- Conjunction of disjunctions of literals (clauses) 

The resolution rule is sound: 

- Only entailed sentences are derived 

Resolution is complete in the sense that it can always be used to either confirm or refute a 
sentence (it can not be used to enumerate true sentences.) 
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Conversion to CNF: 

A sentence that is expressed as a conjunction of disjunctions of literals is said to be in conjunctive 
normal form (CNF). A sentence in CNF that contains only k literals per clause is said to be in k-CNF. 

Algorithm: 

Eliminate ^rewriting P+*Qas (P->Q) a (Q-p) 

Eliminate -^rewriting P->Q as -iPvQ 
Use De Morgan's laws to push -> inwards: 

- rewrite (PvQ) as -iPa-iQ 

- rewrite (PaQ) as -iPv-iQ 

Eliminate double negations: rewrite —. P as P 
Use the distributive laws to get CNF: 

- rewrite (PaQ) vR as (PvR) a (QvR) 

Flatten nested clauses: 

- (PvQ) v R as PvQ v R 

- (PvQ) vR as PvQvR 

Example : Let's illustrate the conversion to CNF by using an example. 

Bo(AvC) 

• Eliminate <=>, replacing a<=> R with («=> R)a(R => a). 

- (B => (A v C)) a ((A v C) => B) 

• Eliminate =>, replacing «=> R with — i a v R. 

- (—,B v A v C) a (-.(A v C) v B) 

• Move -i inwards using de Morgan's rules and double-negation: 

- (—,B v A v C) a ((-.A a -,C) v B) 

• Apply distributivity law (a over v) and flatten: 

- (—iB v A v C) a (—.A v B) a (—iC v B) 

Resolution algorithm 

- Convert KB into CNF 

- Add negation of sentence to be entailed into KB i.e. (KB a -id) 

- Then apply resolution rule to resulting clauses. 
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- The process continues until: 

- There are no new clauses that can be added 

Hence KB does not entail a 

- Two clauses resolve to entail the empty clause. 

Hence KB does entail a 


Example: Consider the knowledge base given as: KB = (B (Av C)) a -iB 

Prove that -.A can be inferred from above KB by using resolution. 

Solution: 

At first, convert KB into CNF 

B => (A v 0) a ((A v C) => B) a— i B 

(—.B v A v C) a ( — i(A v C) v B) a— i B 

(—.B v A v C) a ((—,A a -,C) v B) a— i B 
(—iB v A v C) a (—.A v B) a (—iC v B) a—. B 

Add negation of sentence to be inferred from KB into KB 
Now KB contains following sentences all in CNF 
(—iB v A v C) 

(—.A v B) 
hC v B) 

B 

A (negation of conclusion to be proved) 

Now use Resolution algorithm 
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Resolution: More Examples 

1. KB= {(GVH)->( -> J A—,/C), G}. Show that KB h-J 
Solution: 

Clausal form of (GVH)->-( -l J A—,K) is 

{—iGv—J, -,H v-J, —iGv—iK, -,H v—./C} 

1. —iGv—J [Premise] 

2. -i H v-J [Premise] 

3. —iGv—i/C [Premise] 

4. -i H v-i K [Premise] 

5. G [Premise] 

6. J [ -l Conclusion] 

7. -iG [1, 6 Resolution] 

8. _ [5, 7 Resolution] 

Hence KB entails -J 

2. KB= {P^ -> Q, i Q^Rj. Show that KB h P^P 
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Solution: 

1. PV Q [Premise] 

2. QVR [Premise] 

3. P [ "> Conclusion] 

4. R [ Conclusion] 

5. Q [1, 3 Resolution] 

6. /? [2, 5 Resolution] 

7. _ [4, 6 Resolution] 

Hence, KB b P->R 

3. b ((PVQ)A -> P)->Q 

Clausal form of -l (((PVQ)A -l P)->-Q) is {PVQ, “> P, Q} 

1. PVQ [ “> Conclusion] 

2. "> P [ "> Conclusion] 

3. ^ Q [ “> Conclusion] 

4. Q [1, 2 Resolution] 

5. _ [3,4 Resolution] 

Forward and backward chaining 

The completeness of resolution makes it a very important inference model. But in many practical 
situations full power of resolution is not needed. Real-world knowledge bases often contain only 
clauses of restricted kind called Horn Clause. A Horn clauses is disjunction of literals with at most 
one positive literal 

Three important properties of Horn clause are: 

S Can be written as an implication 

S Inference through forward chaining and backward chaining. 

S Deciding entailment can be done in a time linear size of the knowledge base. 
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Forward chaining: 

Idea: fire any rule whose premises are satisfied in the KB, 


- add its conclusion to the KB, until query is found 


P =► Q 

LAM => P 
B A L =¥ M 
A A P => L 
A A B =r* L 
A 
B 


Q 



B Prove that Q. can be inferred from above KB 


Solution: 
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Backward chaining: 

Idea: work backwards from the query q: to prove q by BC, 

Check if q is known already, or 
Prove by BC all premises of some rule concluding q 
For example, for above KB (as in forward chaining above) 

P=>Q 
La M => P 
B a L => M 
A a P=> L 
A a B=> L 
A 
B 

Prove that Q can be inferred from above KB 
Solution: 

We know P => Q, try to prove P 
L a M => P 

Try to prove L and M 
B a L=> M 
A a P => L 

Try to prove B, L and A and P 

A and B is already known, since A a B => L, L is also known 
Since, B a L => M, M is also known 
Since, L a M => P, p is known, hence the proved. 
First-Order Logic 

Pros and cons of propositional logic 
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Propositional logic is declarative 

Propositional logic allows partial/disjunctive/negated information 
o (unlike most data structures and databases) 

Propositional logic is compositional: 

o meaning of B a P is derived from meaning of B and of P 
Meaning in propositional logic is context-independent 

o (unlike natural language, where meaning depends on context) 
Propositional logic has very limited expressive power 
o (unlike natural language) 


Propositional logic assumes the world contains facts, whereas first-order logic (like natural 
language) assumes the world contains: 

- Objects: people, houses, numbers, colors, baseball games, wars,... 

- Relations: red, round, prime, brother of, bigger than, part of, comes between,... 

- Functions: father of, best friend, one more than, plus,... 

Logics in General 

The primary difference between PL and FOPL is their ontological commitment: 

Ontological Commitment: What exists in the world — TRUTFI 

- PL: facts hold or do not hold. 

- FL : objects with relations between them that hold or do not hold 
Another difference is: 

Epistemological Commitment: What an agent believes about facts — BELIEF 


Language 

Ontological Commitment 

Epistemological Commitment. 

Propositional logic 
First-order logic 
Temporal logic 
Probability theory 
Fuzzy logic 

facts 

facts, objects, relations 
facts, objects, relations, times 
facts 

degree of truth £ [0 ? 1] 

true/false/unknown 
true/false/unknown 
true/false/unknown 
degree of belief € [0,1] 
known interval value 
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FOPL: Syntax 


Predicate Logic: Syntax 


Sentence 


AtomicSentence 

Term 

Connective 

Quantifier 

Constant 

Variable 

Predicate 

Function 


AtomicSentence 

(Sentence Connective Sentence) 
Quantifier Variable,... Sentence 
-i Sentence 

—> PredicatefTerm, ...) \ Term — Term 
Function(Term, ...) | Constant \ Variable 

A | V | => | <=> 

->V\ 3 

A, B, C, X,, X 2 , Jim, Jack 
> a, b, c, Xj, x 2 , counter, position ,... 
Adjacent-To, Younger-Than, HasColor,... 
Father-Of, Square-Position, Sqrt, Cosine 


ambiguities are resolved through precedence or parentheses 


Representing knowledge in first-order logic 

The objects from the real world are represented by constant symbols (a,b,c,...). For instance, the 
symbol "Tom" may represent a certain individual called Tom. 

Properties of objects may be represented by predicates applied to those objects (P(a), ...): e.g 
"male(Tom)" represents that Tom is a male. 

Relationships between objects are represented by predicates with more arguments: "father(Tom, 
Bob)" represents the fact that Tom is the father of Bob. 

The value of a predicate is one of the boolean constants T (i.e. true) or F (i.e. false)."father(Tom, 
Bob) = T" means that the sentence "Tom is the father of Bob" is true. "father(Tom, Bob) = F" means 
that the sentence "Tom is the father of Bob" is false. 

Besides constants, the arguments of the predicates may be functions (f,g,...) or variables (x,y,...). 

Function symbols denote mappings from elements of a domain (or tuples of elements of domains) 
to elements of a domain. For instance, weight is a function that maps objects to their weight: weight 
(Tom) = 150.Therefore the predicate greater-than (weight (Bob), 100) means that the weight of Bob 
is greater than 100. The arguments of a function may themselves be functions. 
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Variable symbols represent potentially any element of a domain and allow the formulation of 
general statements about the elements of the domain. 

The quantifier's V and 3 are used to build new formulas from old ones. 

"3x P(x)" expresses that there is at least one element of the domain that makes P(x) true. 

"3x mother(x, Bob)" means that there is x such that x is mother of Bob or, otherwise stated. Bob 
has a mother. 

"Vx P(x)" expresses that for all elements of the domain P(x) is true. 

Quantifiers 

Allows us to express properties of collections of objects instead of enumerating objects by name. 
Two quantifiers are: 

Universal: "for all" V 
Existential: "there exists" 3 

Universal quantification: 

\/<Variables> <sentence> 

Eg: Everyone at UAB is smart: 

Vx At(x,UAB) => Smart(x) 

Vx P is true in a model m iff P is true for all x in the model 

Roughly speaking, equivalent to the conjunction of instantiations of P 

At(KingJohn,UAB) => Smart(KingJohn) a At(Richard,UAB) => Smart(Richard)AAt(UAB,UAB) => 
Smart(UAB)A... 

Typically, => is the main connective with V 

- A universally quantifier is also equivalent to a set of implications over all objects 
Common mistake: using a as the main connective with V: 

Vx At(x, UAB) a Smart(x) 

Means "Everyone is at UAB and everyone is smart" 

Existential quantification 

3 <variables> <sentence> 

Someone at UAB is smart: 
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3x At(x, UAB) a Smart(x) 

3x P is true in a model m iff P is true for at least one x in the model 

Roughly speaking, equivalent to the disjunction of instantiations of P 

At(KingJohn,UAB) a Smart(KingJohn)vAt(Richard,UAB) a Smart(Richard) 
vAt(UAB, UAB) a Smart(UAB) v ... 

Typically, a is the main connective with 3 

Common mistake: using => as the main connective with 3: 

3x At(x, UAB) => Smart(x) is true even if there is anyone who is not at UAB! 

FOPL: Semantic 

An interpretation is required to give semantics to first-order logic. The interpretation is a non-empty 
"domain of discourse" (set of objects). The truth of any formula depends on the interpretation. 

The interpretation provides, for each: 

constant symbol an object in the domain 

function symbols a function from domain tuples to the domain 

predicate symbol a relation over the domain (a set of tuples) 

Then we define: 

universal quantifier VxP(x) is True iff P(o) is True for all assignments of domain elements 

a tox 

existential quantifier 3xP(x) is True iff P(o) is True for at least one assignment of domain 
element a tox 

FOPL: Inference (Inference in first-order logic) 

First order inference can be done by converting the knowledge base to PL and using propositional 
inference. 

- How to convert universal quantifiers? 

- Replace variable by ground term. 

- How to convert existential quantifiers? 

- Skolemization. 
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Universal instantiation (Ul) 

Substitute ground term (term without variables) for the variables. 

For example consider the following KB 

V x King (x) a Greedy (x) => Evil(x) 

King (John) 

Greedy (John) 

Brother (Richard, John) 

It's Ul is: 

King (John) a Greedy (John) => Evil(John) 

King (Richard) a Greedy (Richard) => Evil(Richard) 

King (John) 

Greedy (John) 

Brother (Richard, John) 

Note: Remove universally quantified sentences after universal instantiation. 

Existential instantiation (El) 

For any sentence a and variable v in that, introduce a constant that is not in the KB (called skolem 
constant) and substitute that constant for v. 

E.g.: Consider the sentence, 3 x Crown(x) a OnFlead(x, John) 

After El, 

Crown(Cl) a OnFlead(Cl, John) where Cl is Skolem Constant. 

Towards Resolution for FOPL: 

Based on resolution for propositional logic 
Extended syntax: allow variables and quantifiers 
Define "clausal form" for first-order logic formulae (CNF) 

Eliminate quantifiers from clausal forms 

Adapt resolution procedure to cope with variables (unification) 

Conversion to CNF: 

1. Eliminate implications and bi-implications as in propositional case 
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2. Move negations inward using De Morgan's laws 

plus rewriting VxP as 3x P and 3xP as Vx P 

3. Eliminate double negations 

4. Rename bound variables if necessary so each only occurs once 

e.g. VxP(x)v3xQ(x) becomes VxP(x)v3yQ(y) 

5. Use equivalences to move quantifiers to the left 

e.g. VxP(x)aQ becomes Vx (P(x)aQ) where x is not in Q 

e.g. VxP(x)A3yQ(y) becomes Vx3y(P(x)AQ(y)) 

6. Skolemise (replace each existentially quantified variable by a new term) 

3xP(x) becomes P(oo) using a Skolem constant ao since 3x occurs at the outermost 

level 

Vx3yP(x, y) becomes P(x,/o(x)) using a Skolem function/o since 3y occurs within Vx 

7. The formula now has only universal quantifiers and all are at the left of the formula: drop 
them 

8. Use distribution laws to get CNF and then clausal form 

Example: 

1.) Vx [VyP(x, y)-> -> Vy(Q(x, y)^R(x, y))] 

Solution: 

1. Vx [ \/yP(x, y)V Vy( Q(x, y)V/?(x, y))] 

2, 3. Vx [3y “> P(x, y)v3y(Q(x, y)A “> R(x, y))] 

4. Vx [3y P(x, y)v3z (Q(x, z)A R(x, z))] 

5. Vx3y3z [ P(x, y)v(Q(x, z)A R(x, z))] 

6. Vx [ -> P(x,/(x))v(Q(x, g(x))A -> R(x, fir(x)))] 
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7. "> P(x,/(x))v(Q(x, g{x ))A "> R(x, g(x))) 

8. (-> P(x,/(x))vQ(x, g(x)))A( -> P(x,/(x))v -> P(x, g(x))) 

8. { -> P(x,/(x))vQ(x, g(x)), -i P(x,/(x))v -i P(x, g(x))} 

2.) -> 3xVyVz ((P(y)VQ(z))->-(P(x)VQ(x))) 

Solution 

1. 3xVyVz ( -> (P(y)VQ(z))VP(x)VQ(x)) 

2. Vx "■ VyVz ( "■ (P(y)VQ(z))VP(x)VQ(x)) 

2. Vx3y "■ Vz ( "■ (P(y)VQ(z))VP(x)VQ(x)) 

2. Vx3y3z "■ ( -1 ( P(y)VQ(z))VP(x)VQ(x )) 

2. Vx3y3z ((P(y)VQ(z))A -> (P(x)VQ(x))) 

6. Vx ((P( /(x))VQ(g(x)))A -> P(x)A -> Q(x)) 

7. (P( /(x))VQ(g(x))A -> P(x)A -> Q(x) 

8. {P(/(x))VQ(g(x)), -i P(x), ->Q(x)} 

Unification: 

A unifier of two atomic formulae is a substitution of terms for variables that makes them identical. 

- Each variable has at most one associated term 

- Substitutions are applied simultaneously 

Unifier of P(x,f(a), z) and P(z,z,u): {x/ f(a), z/ f (a), u/f(a)} 

We can get the inference immediately if we can find a substitution a such that Wngfxj and Greedy(x) 
match King(John) and Greedy(y) 

a = {x/John ; y/John} works 

Unify(a ; p) = 0 if a0 = 0p 

P q 0 

Knows(John ; x) Knows(John ; Jane) {x/Jane} 
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Knows(John,x) Knows(y,OJ) {x/OJ,y/John} 

Knows(John,x) Knows(y,Mother(y)) {y/John,x/Mother(John)}} 

Knows(John,x) Knows(x,OJ) {fail} 

Last unification is failed due to overlap of variables, x can not take the values of John and OJ at the 
same time. 

We can avoid this problem by renaming to avoid the name clashes (standardizing apart) 

E-g. 


Unify{Knows(John,x) Knows(z,OJ)} = {x/OJ, z/John} 

Another complication: 

To unify Knows(John,x) and Knows(y,z), 

Unification of Kno\Ns(John,x) and Knows(y,z) gives a ={y/John, x/z } or a={y/John, x/John, z/John} 

First unifier gives the result Knows(John,z) and second unifier gives the resultKnows(John, John). 
Second can be achieved from first by substituting john in place of z. The first unifier is more general 
than the second. 

There is a single most general unifier (MGU) that is unique up to renaming of variables. 

MGU = {y/John, x/z } 

Towards Resolution for First-Order Logic 

• Based on resolution for propositional logic 

• Extended syntax: allow variables and quantifiers 

• Define "clausal form" for first-order logic formulae 

• Eliminate quantifiers from clausal forms 

• Adapt resolution procedure to cope with variables (unification) 


First-Order Resolution 


For clauses PVQ and Q' Vff with Q,Q' atomic formulae 
PVQ -> Q' VP 
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(Pv/?)0 


where 0 is a most general unifier for Q and Q' 


(PV/?)0 is the resolvent of the two clauses 

Applying Resolution Refutation 

• Negate query to be proven (resolution is a refutation system) 

• Convert knowledge base and negated query into CNF and extract clauses 

• Repeatedly apply resolution to clauses or copies of clauses until either the empty clause 
(contradiction) is derived or no more clauses can be derived (a copy of a clause is the clause 
with all variables renamed) 

• If the empty clause is derived, answer 'yes' (query follows from knowledge base), otherwise 
answer 'no' (query does not follow from knowledge base) 


Resolution: Examples 

1.) i- 3x (P(x)-A/xP(x)) 

Solution: 

Add negation of the conclusion and convert the predicate in to CNF: 
("> 3x(P(x)->VxP(x))) 

1, 2. Vx P(x)vVxP(x)) 

2. Vx ( P(x)AVxP(x)) 

2, 3. Vx (P(x)A3xP(x)) 

4. Vx (P(x)A3y “> P(y)) 

5. Vx3y(P(x)A “> P(y)) 

6. Vx (P(x)A “> P( /(x))) 

8.P(x), -P(/(x)) 
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Now, we can use resolution as; 

1. P(x) [ -1 Conclusion] 

2. -1 P(/(y)) [Copy of -1 Conclusion] 

3. _ [1, 2 Resolution {x//(y)}] 

2.) i- ElxVyVz ((P(y)VQ(z))^(P(x)VQ(x))) 
Solution: 

1. P(/(x))VQ(g(x)) [ -1 Conclusion] 

2. "■ P(x) [ "■ Conclusion] 

3. -1 Q(x) t -1 Conclusion] 

4. P(y) [Copy of 2] 

5. Q{g{x)) [1, 4 Resolution {y//(x)}] 

6. Q(z) [Copy of 3] 

7. _ [5, 6 Resolution {z/g(x)}] 
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The following axioms describe the situation: 

1. If the coin comes up heads, then I win. 

2. If it comes up tails, then you lose. 

3. If it does not come up heads, then it comes up tails. 

4. if you lose, then I win. 

Which may be represented as: 


1. H—>W(me) //H: heads, W: win 

2. T —* L(you) //T: tails, L: lose 

3. -iH—► T 

4. L(you) -* W(me) 

Next, our argument is converted to clause form 


1. ->H v W(me) 

2. ->T v L(you) 

3. HvT 

4. -iL(you) v W(me) 

Then, add the negation of the conclusion 

5. ~W(me) //also in clause form 


Finally, we attempt to obtain a contradiction 


2,4 

1,3 

6.7 

5.8 


-■T v W(me) 
T v W(me) 
W(me) 

o 


6 

7 

8 

//contradiction! 


Hence W(me) 


III win!! 
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What is Learning? 

"Learning denotes changes in the system that are adaptive in the sense that they enable the 
system to do the same task (or tasks drawn from the same population) more effectively the next 
time." -Herbert Simon 

"Learning is constructing or modifying representations of what is being experienced." -Ryszard 
Michalski 

"Learning is making useful changes in our minds." -Marvin Minsky 

Types of Learning: 

The strategies for learning can be classified according to the amount of inference the system has to 
perform on its training data. In increasing order we have 

1. Rote learning - the new knowledge is implanted directly with no inference at all, e.g. simple 
memorisation of past events, or a knowledge engineer's direct programming of rules elicited from 
a human expert into an expert system. 

2. Supervised learning - the system is supplied with a set of training examples consisting of inputs 
and corresponding outputs, and is required to discover the relation or mapping between then, e.g. 
as a series of rules, or a neural network. 

3. Unsupervised learning -the system is supplied with a set of training examples consisting only of 
inputs and is required to discover for itself what appropriate outputs should be, e.g. a Kohonen 
Network or Self Organizing Map. 

Early expert systems relied on rote learning, but for modern Al systems we are generally interested 
in the supervised learning of various levels of rules. 

The need for Learning: 

As with many other types of Al system, it is much more efficient to give the system enough 
knowledge to get it started, and then leave it to learn the rest for itself. We may even end up with 
a system that learns to be better than a human expert. 

The general learning approach is to generate potential improvements, test them, and discard 
those which do not work. Naturally, there are many ways we might generate the potential 
improvements, and many ways we can test their usefulness. At one extreme, there are model 
driven (top-down) generators of potential improvements, guided by an understanding of how the 
problem domain works. At the other, there are data driven (bottom-up) generators, guided by 
patterns in some set of training data. 

Machine Learning: 
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As regards machines, we might say, very broadly, that a machine learns whenever it changes its 
structure, program, or data (based on its inputs or in response to external information) in such a 
manner that its expected future performance improves. Some of these changes, such as the 
addition of a record to a data base, fall comfortably within the province of other disciplines and are 
not necessarily better understood for being called learning. But, for example, when the 
performance of a speech-recognition machine improves after hearing several samples of a person's 
speech, we feel quite justified in that case saying that the machine has learned. 

Machine learning usually refers to the changes in systems that perform tasks associated with 
artificial intelligence (Al). Such tasks involve recognition, diagnosis, planning, robot control, 
prediction, etc. The changes might be either enhancements to already performing systems or 
synthesis of new systems. 

Learning through Examples: (A type of Concept learning) 

Concept learning also refers to a learning task in which a human or machine learner is trained to 
classify objects by being shown a set of example objects along with their class labels. The learner 
will simplify what has been observed in an example. This simplified version of what has been 
learned will then be applied to future examples. Concept learning ranges in simplicity and 
complexity because learning takes place over many areas. When a concept is more difficult, it will 
be less likely that the learner will be able to simplify, and therefore they will be less likely to learn. 
This learning by example consists of the idea of version space. 

A version space is a hierarchical representation of knowledge that enables you to keep track of all 
the useful information supplied by a sequence of learning examples without remembering any of 
the examples. 

The version space method is a concept learning process accomplished by managing multiple 
models within a version space. 

Version Space Characteristics 

In settings where there is a generality-ordering on hypotheses, it is possible to represent the version 
space by two sets of hypotheses: (1) the most specific consistent hypotheses and (2) the most 
general consistent hypotheses, where "consistent" indicates agreement with observed data. 

The most specific hypotheses (i.e., the specific boundary SB) are the hypotheses that cover the 
observed positive training examples, and as little of the remaining feature space as possible. These 
are hypotheses which if reduced any further would exclude a positive training example, and hence 
become inconsistent. These minimal hypotheses essentially constitute a (pessimistic) claim that the 
true concept is defined just by the positive data already observed: Thus, if a novel (never-before- 
seen) data point is observed, it should be assumed to be negative. (I.e., if data has not previously 
been ruled in, then it's ruled out.) 
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The most general hypotheses (i.e., the general boundary GB) are those which cover the observed 
positive training examples, but also cover as much of the remaining feature space without including 
any negative training examples. These are hypotheses which if enlarged any further would include 
a negative training example, and hence become inconsistent. 

Tentative heuristics are represented using version spaces. A version space represents all the 
alternative plausible descriptions of a heuristic. A plausible description is one that is applicable to 
all known positive examples and no known negative example. 

A version space description consists of two complementary trees: 

1. One that contains nodes connected to overly general models, and 

2. One that contains nodes connected to overly specific models. 

Node values/attributes are discrete. 

Fundamental Assumptions 

1. The data is correct; there are no erroneous instances. 

2. A correct description is a conjunction of some of the attributes with values. 

Diagrammatical Guidelines 

There is a generalization tree and a specialization tree. 

Each node is connected to a model. 

Nodes in the generalization tree are connected to a model that matches everything in its subtree. 
Nodes in the specialization tree are connected to a model that matches only one thing in its subtree. 
Links between nodes and their models denote 

• generalization relations in a generalization tree, and 

• specialization relations in a specialization tree. 

Diagram of a Version Space 

In the diagram below, the specialization tree is colored red, and the generalization tree is colored 

green. 
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o 

A 

o o 

A A 



The most general model matches everything 


Negative instances specialize general descriptions 


Positive instances prune the general descriptions 


Eventually, positive and negative samples may 
force the general and specific models to converge 
on a solution 


Negative instances prune the specific descriptions 


Positive instances generalize specific descriptions 


The most specific model matches only one thing 


Generalization and Specialization Leads to Version Space Convergence 

The key idea inversion space learning is that specialization of the general models and generalization 
of the specific models may ultimately lead to just one correct model that matches all observed 
positive examples and does not match any negative examples. 

That is, each time a negative example is used to specialilize the general models, those specific 
models that match the negative example are eliminated and each time a positive example is used 
to generalize the specific models, those general models that fail to match the positive example are 
eliminated. Eventually, the positive and negative examples may be such that only one general 
model and one identical specific model survive. 



Shiv Raj Pant 












Artificial Intelligence 


Candidate Elimination Algorithm: 

The version space method handles positive and negative examples symmetrically. 

Given: 


• A representation language. 

• A set of positive and negative examples expressed in that language. 

Compute: a concept description that is consistent with all the positive examples and none of the 
negative examples. 

Method: 

• Initialize G, the set of maximally general hypotheses, to contain one element: the null 
description (all features are variables). 

• Initialize S, the set of maximally specific hypotheses, to contain one element: the first 
positive example. 

• Accept a new training example. 

o If the example is positive: 

1. Generalize all the specific models to match the positive example, but 
ensure the following: 

■ The new specific models involve minimal changes. 

■ Each new specific model is a specialization of some general 
model. 

■ No new specific model is a generalization of some other specific 
model. 

2. Prune away all the general models that fail to match the positive 
example. 

o If the example is negative: 

1. Specialize all general models to prevent match with the negative 
example, but ensure the following: 

■ The new general models involve minimal changes. 

■ Each new general model is a generalization of some specific 
model. 

■ No new general model is a specialization of some other general 
model. 

2. Prune away all the specific models that match the negative example, 
o If S and G are both singleton sets, then: 

■ if they are identical, output their value and halt. 

■ if they are different, the training cases were inconsistent. Output this 
result and halt. 

■ else continue accepting new training examples. 

The algorithm stops when: 
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1. It runs out of data. 

2. The number of hypotheses remaining is: 

o 0 - no consistent description for the data in the language, 
o 1 - answer (version space converges), 
o T - all descriptions in the language are implicitly included. 

Problem 1: 

Learning the concept of "Japanese Economy Car" 

Features: ( Country of Origin, Manufacturer, Color, Decade, Type ) 


Origin 

Manufacturer 

Color 

Decade 

Type 

Example Type 

Japan 

Honda 

Blue 

1980 

Economy 

Positive 

Japan 

Toyota 

Green 

1970 

Sports 

Negative 

Japan 

Toyota 

Blue 

1990 

Economy 

Positive 

USA 

Chrysler 

Red 

1980 

Economy 

Negative 

Japan 

Honda 

White 

1980 

Economy 

Positive 


Solution: 

1. Positive Example: (Japan, Honda, Blue, 1980, Economy) 

Initialize G to a singleton set 

that includes everything. G = {(?, ?, ?, ?, ?)} 

Initialize S to a singleton set S = {(Japan, Honda, Blue, 1980, 

that includes the first positive Economy)} 

example. 


(Japan, Honda, Blue, 1980, Economy) 



These models represent the most general and the most specific heuristics one might learn. 
The actual heuristic to be learned, "Japanese Economy Car", probably lies between them 
somewhere within the version space. 


2. Negative Example: (Japan, Toyota, Green, 1970, Sports) 
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Specialize G to exclude the negative example. 


{(?, Honda, ?, ?, ?), 

(?, ?, Blue, ?,?), 

" (?, ?, ?, 1980, ?), 

(?, ?, ?, ?, Economy)} 

S = {(Japan, Honda, Blue, 1980, Economy)} 



(Japan, Honda, Blue, 1980, Economy) (^) 


Refinement occurs by generalizing S or specializing G, until the heuristic hopefully 
converges to one that works well. 

3. Positive Example: (Japan, Toyota, Blue, 1990, Economy) 

Prune G to exclude descriptions inconsistent with the positive example. (Prune = ^< ) 
Generalize S to include the positive example. 


G _ {(w ?, Blue, ?, ?), 

(?, ?, ?, ?, Economy)} 

S = {(Japan, ?, Blue, ?, Economy)} 




Q 

X 

96 

_ / 
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( Japan, ?, Blue, ?, Economy) 


( Japan, Honda, Blue, 1980, Economy) 


o 

o 


4. Negative Example: (USA, Chrysler, Red, 1980, Economy) 

Specialize G to exclude the negative example (but stay consistent with S) 


^ {(?,?, Blue, ?, ?), 

(j = 

(Japan, ?, ?, ?, Economy)} 

S = {(Japan, ?, Blue, ?, Economy)) 



( Japan, ?, Blue, ?, Economy) (^) 

( Japan, Honda, Blue, 1980, Economy) 

5. Positive Example: (Japan, Honda, White, 1980, Economy) 

Prune G to exclude descriptions inconsistent with positive example. 
Generalize S to include positive example. 


G = {(Japan, ?, ?, ?, Economy)} 
S = {(Japan, ?, ?, ?, Economy)} 



, Economy) 


, Economy) 
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( Japan, ?,?,?, Economy) 

( Japan, ?, Blue, ?, Economy) 

( Japan, Honda, Blue, 19S0, Economy) (J) 

G and S are singleton sets and S = G. 
Converged. 

No more data, so algorithm stops. 



(?, ?, ?,?,?) 


( ?, ?, ?, ?, Economy) 


( Japan, ?, ?, ?, Economy) 


A 


V 


( Japan, ?, ?, ?, Economy) (^)) 


(Jap an, ?, Blue, ?, Economy) 


(Jap an, Honda, Blue, 1980, Economy) (^) 


Explanation Based Machine Learning: 

Explanation-based learning (EBL) is a form of machine learning that exploits a very strong, or even 
perfect, domain theory to make generalizations or form concepts from training examples. This is a 
type of analytic learning. The advantage of explanation-based learning is that, as a deductive 
mechanism, it requires only a single training example ( inductive learning methods often require 
many training examples) 


An Explanation-based Learning (EBL ) system accepts an example (i.e. a training example) and 
explains what it learns from the example. The EBL system takes only the relevant aspects of the 
training. 
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EBL accepts four inputs: 

A training example : what the learning sees in the world, (specific facts that rule out some possible 
hypotheses) 

A goal concept : a high level description of what the program is supposed to learn, (the set of all 
possible conclusions) 

A operational criterion : a description of which concepts are usable, (criteria for determining which 
features in the domain are efficiently recognizable, e.g. which features are directly detectable using 
sensors) 

A domain theory : a set of rules that describe relationships between objects and actions in a 
domain, (axioms about a domain of interest) 

From this EBL computes a generalization of the training example that is sufficient not only 
to describe the goal concept but also satisfies the operational criterion. 

This has two steps: 

Explanation: the domain theory is used to prune away all unimportant aspects of the training 
example with respect to the goal concept. 

Generalisation: the explanation is generalized as far possible while still describing the goal concept 



generalization 
(store generalized 
structures) 


generalized structures 
= generalized parses 
= (generalized) macro-rules 
= rule-chunks 


application on 
new examples 


An example of EBL using a perfect domain theory is a program that learns to play chess by being 
shown examples. A specific chess position that contains an important feature, say, "Forced loss of 
black queen in two moves," includes many irrelevant features, such as the specific scattering of 
pawns on the board. EBL can take a single training example and determine what the relevant 
features are in order to form a generalization. 

Learning by Analogy: 
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Reasoning by analogy generally involves abstracting details from a a particular set of problems and 
resolving structural similarities between previously distinct problems. Analogical reasoning refers 
to this process of recognition and then applying the solution from the known problem to the new 
problem. Such a technique is often identified as case-based reasoning. Analogical learning 
generally involves developing a set of mappings between features of two instances. 


case-1 

case-2 

case-3 

case-4 




anabgy 


¥ 


Answer 


Question ^ 


The question in above figure represents some known aspects of a new case, which has unknown 
aspects to be determined. In deduction, the known aspects are compared (by a version of structure 
mapping called unification) with the premises of some implication. Then the unknown aspects, 
which answer the question, are derived from the conclusion of the implication. In analogy, the 
known aspects of the new case are compared with the corresponding aspects of the older cases. 
The case that gives the best match may be assumed as the best source of evidence for estimating 
the unknown aspects of the new case. The other cases show alternative possibilities for those 
unknown aspects; the closer the agreement among the alternatives, the stronger the evidence for 
the conclusion. 

1. Retrieve: Given a target problem, retrieve cases from memory that are relevant to solving 
it. A case consists of a problem, its solution, and, typically, annotations about how the 
solution was derived. For example, suppose Fred wants to prepare blueberry pancakes. 
Being a novice cook, the most relevant experience he can recall is one in which he 
successfully made plain pancakes. The procedure he followed for making the plain 
pancakes, together with justifications for decisions made along the way, constitutes Fred's 
retrieved case. 

2. Reuse: Map the solution from the previous case to the target problem. This may involve 
adapting the solution as needed to fit the new situation. In the pancake example, Fred must 
adapt his retrieved solution to include the addition of blueberries. 

3. Revise: Having mapped the previous solution to the target situation, test the new solution 
in the real world (or a simulation) and, if necessary, revise. Suppose Fred adapted his 
pancake solution by adding blueberries to the batter. After mixing, he discovers that the 
batter has turned blue - an undesired effect. This suggests the following revision: delay the 
addition of blueberries until after the batter has been ladled into the pan. 

4. Retain: After the solution has been successfully adapted to the target problem, store the 
resulting experience as a new case in memory. Fred, accordingly, records his newfound 
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procedure for making blueberry pancakes, thereby enriching his set of stored experiences, 
and better preparing him for future pancake-making demands. 

Transformational Analogy: 

Suppose you are asked to prove a theorem in plane geometry. You might look for a previous 
theorem that is very similar and copy its proof, making substitutions when necessary. The idea is to 
transform a solution to a previous problem in to solution for the current problem. The following 
figure shows this process, 



Solution to New 


Solution to Old 

Problem 


Solution 


-* 


Fig: Transformational Analogy 


Derivational Analogy: 

Notice that transformational analogy does not look at how the old problem was solved, it only looks 
at the final solution. Often the twists and turns involved in solving an old problem are relevant to 
solving a new problem. The detailed history of problem solving episode is called derivation. 
Analogical reasoning that takes these histories into account is called derivational analogy. 




_Q 

X 

101 
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Solution to New 


Solution to Old 

Problem 


Solution 


Fig: Derivational Analogy 


For details of the above mentioned theory, Refer Book:- E. Rich, K. Knight, S. B. 
Nair, Tata MacGraw Hill ( Pages 371-372) 

Learning by Simulating Evolution: 

Refer Book:- P. H. Winston, Artificial Intelligence, Addison Wesley. (Around page 220) 
Learning by Training Perceptron: 

Below is an example of a learning algorithm for a single-layer (no hidden-layer) perceptron. 
For multilayer perceptrons, more complicated algorithms such as backpropagation 
must be used. Or, methods such as the delta rule can be used if the function is non-linear 
and differentiable, although the one below will work as well. 

The learning algorithm we demonstrate is the same across all the output neurons, therefore 
everything that follows is applied to a single neuron in isolation. We first define some 
variables: 

• x(j) denotes the j-th item in the n-dimensional input vector 

• w(j) denotes the j-th item in the weight vector 

• f(x) denotes the output from the neuron when presented with input x 

• a is a constant where 0 < Ct ^ l(learning rate) 

Assume for the convenience that the bias term b is zero. An extra dimension n + 1 can be 
added to the input vectors x with x(n + 1) = 1, in which case w(n + 1) replaces the bias term. 








Artificial Intelligence 



the appropriate weights are applied to the inputs, and the resulting weighted sum passed to a 
function which produces the output y 

Let {(^li l/l) j ■ ■ ■ ■> Vm) }be training set of m training examples, where x, 

is the input vector to the perceptron and yi is the desired output value of the perceptron for 
that input vector. 

Learning algorithm steps: 

1. Initialize weights and threshold. 

• Set Wi(t), (1 < i < m) to be the weight i at time t, and 0 to be the threshold value in the 
output node. 

• Set w(0) to be -0,the bias, and x(0) to be always 1. 

• Set w,-(l) to small random values, thus initialising the weights and threshold. 

2. Present input and desired output 

• Present input Xo = 1 and Xi,X 2 ,...,x m and desired output d(t) 

3. Calculate the actual output 

• y(t) =fh[w 0 (t) + Wi(t)xi(t) + W 2 (t)x 2 (t) +.... + w m (t)x m (t)] 

4. Adapts weights 

• wi(t + 1) = wi(t) + a[d(t) - y(t)]Xj(t) , for 0 ^ ^ ^ 

Steps 3 and 4 are repeated until the iteration error is less than a user-specified error threshold 

or a predetermined number of iterations have been completed. 















